2025-09-07T06:38:43.9422279Z Current runner version: '2.328.0' 2025-09-07T06:38:43.9428941Z Runner name: 'gpu6c07' 2025-09-07T06:38:43.9429719Z Runner group name: 'linux.rocm.gpu.group' 2025-09-07T06:38:43.9430775Z Machine name: 'gpu6c07' 2025-09-07T06:38:43.9433714Z ##[group]GITHUB_TOKEN Permissions 2025-09-07T06:38:43.9435833Z Contents: read 2025-09-07T06:38:43.9436422Z Metadata: read 2025-09-07T06:38:43.9437065Z ##[endgroup] 2025-09-07T06:38:43.9439101Z Secret source: Actions 2025-09-07T06:38:43.9439845Z Prepare workflow directory 2025-09-07T06:38:44.3905108Z Prepare all required actions 2025-09-07T06:38:44.3960953Z Getting action download info 2025-09-07T06:38:44.7467721Z Download action repository 'pytorch/pytorch@main' (SHA:93fb23d6fae7c4e82c4239a1033e522088742634) 2025-09-07T06:38:49.2918466Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-09-07T06:38:49.7974216Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-09-07T06:38:50.1937781Z Download action repository 'pytorch/test-infra@main' (SHA:548a4bc624d43a01cdf165a63b041f0ae014ddbd) 2025-09-07T06:38:51.1020642Z Download action repository 'actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-09-07T06:38:51.6966718Z Getting action download info 2025-09-07T06:38:51.8235093Z Download action repository 'actions/checkout@v4' (SHA:08eba0b27e820071cde6df949e0beb9ba4906955) 2025-09-07T06:38:52.3458736Z Getting action download info 2025-09-07T06:38:52.5031789Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-09-07T06:38:52.9272004Z Getting action download info 2025-09-07T06:38:53.0877479Z Uses: pytorch/pytorch/.github/workflows/_rocm-test.yml@refs/heads/main (93fb23d6fae7c4e82c4239a1033e522088742634) 2025-09-07T06:38:53.0881745Z ##[group] Inputs 2025-09-07T06:38:53.0882098Z build-environment: linux-jammy-rocm-py3.10 2025-09-07T06:38:53.0882886Z test-matrix: {"include": [{"config": "slow", "shard": 1, "num_shards": 2, "runner": "linux.rocm.gpu.2", "owners": ["module:rocm"]}, {"config": "slow", "shard": 2, "num_shards": 2, "runner": "linux.rocm.gpu.2", "owners": ["module:rocm"]}]} 2025-09-07T06:38:53.0883977Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:38:53.0884628Z sync-tag: 2025-09-07T06:38:53.0885397Z timeout-minutes: 300 2025-09-07T06:38:53.0885645Z tests-to-include: 2025-09-07T06:38:53.0885861Z dashboard-tag: 2025-09-07T06:38:53.0886378Z disable-monitor: true 2025-09-07T06:38:53.0886651Z monitor-log-interval: 5 2025-09-07T06:38:53.0886907Z monitor-data-collect-interval: 1 2025-09-07T06:38:53.0887189Z ##[endgroup] 2025-09-07T06:38:53.0887581Z Complete job name: linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm) 2025-09-07T06:38:53.2852312Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-09-07T06:38:53.2853016Z with: 2025-09-07T06:38:53.2853201Z no-sudo: true 2025-09-07T06:38:53.2853424Z submodules: recursive 2025-09-07T06:38:53.2853660Z fetch-depth: 0 2025-09-07T06:38:53.2854142Z env: 2025-09-07T06:38:53.2854327Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:38:53.2854549Z ##[endgroup] 2025-09-07T06:38:53.2939526Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:38:53.2940392Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:38:53.2985785Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:38:53.2986131Z env: 2025-09-07T06:38:53.2986320Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:38:53.2986536Z ##[endgroup] 2025-09-07T06:38:53.3229495Z ##[group]Run # Use all available CPUs for fetching 2025-09-07T06:38:53.3230242Z # Use all available CPUs for fetching 2025-09-07T06:38:53.3230546Z cd "${GITHUB_WORKSPACE}" 2025-09-07T06:38:53.3230838Z git config --global fetch.parallel 0 2025-09-07T06:38:53.3231151Z git config --global submodule.fetchJobs 0 2025-09-07T06:38:53.3231436Z  2025-09-07T06:38:53.3231721Z # Clean workspace. The default checkout action should also do this, but 2025-09-07T06:38:53.3232110Z # do it here as well just in case 2025-09-07T06:38:53.3232372Z if [[ -d .git ]]; then 2025-09-07T06:38:53.3232615Z  if [ -z "${NO_SUDO}" ]; then 2025-09-07T06:38:53.3232861Z  sudo git clean -ffdx 2025-09-07T06:38:53.3233103Z  else 2025-09-07T06:38:53.3233293Z  git clean -ffdx 2025-09-07T06:38:53.3233503Z  fi 2025-09-07T06:38:53.3233686Z fi 2025-09-07T06:38:53.3274049Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:38:53.3274416Z env: 2025-09-07T06:38:53.3274596Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:38:53.3274817Z NO_SUDO: true 2025-09-07T06:38:53.3275097Z ##[endgroup] 2025-09-07T06:38:53.7072619Z Removing .additional_ci_files/ 2025-09-07T06:38:53.7072999Z Removing build/ 2025-09-07T06:38:53.7073200Z Removing dist/ 2025-09-07T06:38:53.7073411Z Removing test/test-reports/ 2025-09-07T06:38:53.7142551Z ##[group]Run actions/checkout@v4 2025-09-07T06:38:53.7142838Z with: 2025-09-07T06:38:53.7143076Z ref: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:38:53.7143381Z fetch-depth: 0 2025-09-07T06:38:53.7143593Z submodules: recursive 2025-09-07T06:38:53.7143832Z show-progress: false 2025-09-07T06:38:53.7144058Z repository: pytorch/pytorch 2025-09-07T06:38:53.7144431Z token: *** 2025-09-07T06:38:53.7144625Z ssh-strict: true 2025-09-07T06:38:53.7144833Z ssh-user: git 2025-09-07T06:38:53.7145031Z persist-credentials: true 2025-09-07T06:38:53.7145273Z clean: true 2025-09-07T06:38:53.7145507Z sparse-checkout-cone-mode: true 2025-09-07T06:38:53.7145756Z fetch-tags: false 2025-09-07T06:38:53.7145956Z lfs: false 2025-09-07T06:38:53.7146143Z set-safe-directory: true 2025-09-07T06:38:53.7146367Z env: 2025-09-07T06:38:53.7146537Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:38:53.7146761Z ##[endgroup] 2025-09-07T06:38:53.8301843Z Syncing repository: pytorch/pytorch 2025-09-07T06:38:53.8303100Z ##[group]Getting Git version info 2025-09-07T06:38:53.8303590Z Working directory is '/var/home/pytorchci/actions-runner/_work/pytorch/pytorch' 2025-09-07T06:38:53.8304200Z [command]/usr/bin/git version 2025-09-07T06:38:53.8315738Z git version 2.34.1 2025-09-07T06:38:53.8341553Z ##[endgroup] 2025-09-07T06:38:53.8359063Z Copying '/var/home/pytorchci/.gitconfig' to '/var/home/pytorchci/actions-runner/_work/_temp/31393713-3575-4b49-8de6-61a2b8e5afb0/.gitconfig' 2025-09-07T06:38:53.8369389Z Temporarily overriding HOME='/var/home/pytorchci/actions-runner/_work/_temp/31393713-3575-4b49-8de6-61a2b8e5afb0' before making global git config changes 2025-09-07T06:38:53.8370337Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T06:38:53.8373640Z [command]/usr/bin/git config --global --add safe.directory /var/home/pytorchci/actions-runner/_work/pytorch/pytorch 2025-09-07T06:38:53.8425281Z [command]/usr/bin/git config --local --get remote.origin.url 2025-09-07T06:38:53.8450543Z https://github.com/pytorch/pytorch 2025-09-07T06:38:53.8465727Z ##[group]Removing previously created refs, to avoid conflicts 2025-09-07T06:38:53.8468901Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-09-07T06:38:53.8500892Z HEAD 2025-09-07T06:38:53.8546161Z ##[endgroup] 2025-09-07T06:38:53.8548549Z [command]/usr/bin/git submodule status 2025-09-07T06:38:53.8984750Z 7e1e1fe3858c63c251c637ae41a20de425dde96f android/libs/fbjni (v0.1.0-12-g7e1e1fe) 2025-09-07T06:38:53.9114639Z 4dfe081cf6bcd15db339cf2680b9281b8451eeb3 third_party/FP16 (4dfe081) 2025-09-07T06:38:53.9249912Z b408327ac2a15ec3e43352421954f5b1967701d1 third_party/FXdiv (b408327) 2025-09-07T06:38:53.9406956Z c07e3a0400713d546e0dea2d5466dd22ea389c73 third_party/NNPACK (c07e3a0) 2025-09-07T06:38:53.9476333Z 2942f167cc30c5e3a44a2aecd5b0d9c07ff61a07 third_party/NVTX (v3.1.0-263-g2942f16) 2025-09-07T06:38:53.9580632Z 1d8f600fd424278486eade7ed3e877c99f0846b1 third_party/VulkanMemoryAllocator (v2.1.0-982-g1d8f600) 2025-09-07T06:38:54.0150056Z 51a0103656eff6fc9bfd39a4597923c4b542c883 third_party/XNNPACK (remotes/origin/ds/ndk-1243-g51a0103656) 2025-09-07T06:38:54.0201939Z 01aae101b9e5e94d6c16a9514c9fb8df99c93150 third_party/aiter (v0.1.1-92-g01aae101) 2025-09-07T06:38:54.0239529Z 299e5928955cc62af9968370293b916f5130916f third_party/benchmark (v1.9.3) 2025-09-07T06:38:54.0338973Z 7fe50dc3da2069d6645d9deb8c017a876472a977 third_party/composable_kernel (rocm-6.4.3-459-g7fe50dc3d) 2025-09-07T06:38:54.0498135Z 89c932f313c6437c38f2982869beacc89c2f2246 third_party/cpp-httplib (v0.26.0) 2025-09-07T06:38:54.0667737Z 5e3d2445e6a84d9599bee2bf78edbb4d80865e1d third_party/cpuinfo (5e3d244) 2025-09-07T06:38:54.0722059Z f937055efc6d414d11f4c6577e3977fe74f35fb6 third_party/cudnn_frontend (v0.5-52-gf937055) 2025-09-07T06:38:54.0849376Z e51efbfe18fe4f4cbb66ab814c55bf4aa0185491 third_party/cutlass (v4.1.0) 2025-09-07T06:38:54.0910555Z 21c7d30c526c0f1ad873ecc632dca6cfa8a69067 third_party/fbgemm (v1.3.0-rc1-165-g21c7d30c) 2025-09-07T06:38:54.1029297Z 979702c87a8713a8e0a5e9fee122b90d2ef13be5 third_party/flash-attention (v2.7.4) 2025-09-07T06:38:54.1067379Z a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757 third_party/flatbuffers (v24.12.23) 2025-09-07T06:38:54.1509556Z 40626af88bd7df9a5fb80be7b25ac85b122d6c21 third_party/fmt (11.2.0) 2025-09-07T06:38:54.1669523Z 3fb5c176c17c765a3492cd2f0321b0dab712f350 third_party/gemmlowp/gemmlowp (remotes/origin/revert-87-master-135-g3fb5c17) 2025-09-07T06:38:54.1844744Z c7b7b022c124d9643957d9bd55f57ac59fce8fa2 third_party/gloo (remotes/origin/gh/c-p-i-o/1/base-33-gc7b7b02) 2025-09-07T06:38:54.2098614Z 52eb8108c5bdec04579160ae17225d66034bd723 third_party/googletest (release-1.8.0-3544-g52eb8108) 2025-09-07T06:38:54.2210128Z 719d8e6cd7f7a0e01b155657526d693acf97c2b3 third_party/ideep (pytorch-rls-v3.7.1) 2025-09-07T06:38:54.2297490Z dec1d23ca65ab069d225dfe40dea14f455170959 third_party/ittapi (v3.25.5) 2025-09-07T06:38:54.2598286Z 5e7501833f1021ce6f618572d3baf657b6319658 third_party/kineto (remotes/origin/sraikund/test-98-g5e75018) 2025-09-07T06:38:54.2636328Z cca02c2f69dd18e1f12647c1c0bdc8cf90e680c7 third_party/kleidiai (v1.8.0) 2025-09-07T06:38:54.2674526Z fbd8b99c2b828428947d70fdc046bb55609be93e third_party/mimalloc (v2.2.4) 2025-09-07T06:38:54.2711745Z 55f93686c01528224f448c19128836e7df245f72 third_party/nlohmann (v3.12.0) 2025-09-07T06:38:54.3069887Z e709452ef2bbc1d113faf678c24e6d3467696e83 third_party/onnx (v1.18.0) 2025-09-07T06:38:54.3105033Z a799f4aed9c94b765dcdaabaeab7d5e7e2310878 third_party/opentelemetry-cpp (v1.14.2) 2025-09-07T06:38:54.3148866Z 0fa0ef591e38c2758e3184c6c23e497b9f732ffa third_party/pocketfft (release_for_eigen-40-g0fa0ef5) 2025-09-07T06:38:54.3559428Z d1eca4e4b421cd2997495c4b4e65cea6be4e9b8a third_party/protobuf (v3.7.0-rc.2-1279-gd1eca4e4b) 2025-09-07T06:38:54.3673694Z 072586a71b55b7f8c584153d223e95687148a900 third_party/psimd (heads/master) 2025-09-07T06:38:54.3751113Z 4fe0e1e183925bf8cfa6aae24237e724a96479b8 third_party/pthreadpool (0.1-144-g4fe0e1e) 2025-09-07T06:38:54.3786290Z f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8 third_party/pybind11 (v3.0.1) 2025-09-07T06:38:54.3899719Z f45429b087dd7d5bc78bb40dc7cf06425c252d67 third_party/python-peachpy (remotes/origin/pre-generated) 2025-09-07T06:38:54.4003100Z 5a1d179df9cf652951b59010a2d2075372d67f68 third_party/sleef (3.8) 2025-09-07T06:38:54.4114523Z af0118d13e52f5a08841464a768e01a0bf3e3075 third_party/tensorpipe (heads/main) 2025-09-07T06:38:54.4135558Z ##[group]Cleaning the repository 2025-09-07T06:38:54.4140138Z [command]/usr/bin/git clean -ffdx 2025-09-07T06:38:54.4492630Z [command]/usr/bin/git reset --hard HEAD 2025-09-07T06:38:54.5444082Z HEAD is now at 9aedb3cd87b [AOTI-FX] Support registering custom FX backends (#162317) 2025-09-07T06:38:54.5488810Z ##[endgroup] 2025-09-07T06:38:54.5491041Z ##[group]Disabling automatic garbage collection 2025-09-07T06:38:54.5495953Z [command]/usr/bin/git config --local gc.auto 0 2025-09-07T06:38:54.5537957Z ##[endgroup] 2025-09-07T06:38:54.5538539Z ##[group]Setting up auth 2025-09-07T06:38:54.5545329Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T06:38:54.5599065Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T06:38:54.5985036Z Entering 'android/libs/fbjni' 2025-09-07T06:38:54.6054818Z Entering 'third_party/FP16' 2025-09-07T06:38:54.6125915Z Entering 'third_party/FXdiv' 2025-09-07T06:38:54.6194697Z Entering 'third_party/NNPACK' 2025-09-07T06:38:54.6267117Z Entering 'third_party/NVTX' 2025-09-07T06:38:54.6346641Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:38:54.6425053Z Entering 'third_party/XNNPACK' 2025-09-07T06:38:54.6509785Z Entering 'third_party/aiter' 2025-09-07T06:38:54.6580677Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:38:54.6654529Z Entering 'third_party/benchmark' 2025-09-07T06:38:54.6724303Z Entering 'third_party/composable_kernel' 2025-09-07T06:38:54.6805346Z Entering 'third_party/cpp-httplib' 2025-09-07T06:38:54.6870992Z Entering 'third_party/cpuinfo' 2025-09-07T06:38:54.6942685Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:38:54.7006746Z Entering 'third_party/cutlass' 2025-09-07T06:38:54.7087848Z Entering 'third_party/fbgemm' 2025-09-07T06:38:54.7162388Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:38:54.7229623Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:38:54.7307902Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:38:54.7369895Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:38:54.7449385Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:38:54.7519759Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:38:54.7588854Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:38:54.7654476Z Entering 'third_party/flash-attention' 2025-09-07T06:38:54.7725249Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:38:54.7799719Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:38:54.7883290Z Entering 'third_party/flatbuffers' 2025-09-07T06:38:54.7957357Z Entering 'third_party/fmt' 2025-09-07T06:38:54.8030858Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:38:54.8109488Z Entering 'third_party/gloo' 2025-09-07T06:38:54.8169162Z Entering 'third_party/googletest' 2025-09-07T06:38:54.8239579Z Entering 'third_party/ideep' 2025-09-07T06:38:54.8308880Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:38:54.8383797Z Entering 'third_party/ittapi' 2025-09-07T06:38:54.8445694Z Entering 'third_party/kineto' 2025-09-07T06:38:54.8513488Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:38:54.8576568Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:38:54.8645141Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:38:54.8712527Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:38:54.8779564Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:38:54.8843735Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:38:54.8914119Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:38:54.8984652Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:38:54.9046133Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:38:54.9114370Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:38:54.9183093Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:38:54.9242108Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:38:54.9306403Z Entering 'third_party/kleidiai' 2025-09-07T06:38:54.9373304Z Entering 'third_party/mimalloc' 2025-09-07T06:38:54.9443542Z Entering 'third_party/nlohmann' 2025-09-07T06:38:54.9513447Z Entering 'third_party/onnx' 2025-09-07T06:38:54.9603366Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:38:54.9679584Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:38:54.9751914Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:38:54.9822158Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:38:54.9881073Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:38:54.9942834Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:38:55.0005172Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:38:55.0071122Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:38:55.0141107Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:38:55.0201277Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:38:55.0265601Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:38:55.0340458Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:38:55.0427408Z Entering 'third_party/pocketfft' 2025-09-07T06:38:55.0504191Z Entering 'third_party/protobuf' 2025-09-07T06:38:55.0573439Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:38:55.0637976Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:38:55.0709878Z Entering 'third_party/psimd' 2025-09-07T06:38:55.0784018Z Entering 'third_party/pthreadpool' 2025-09-07T06:38:55.0851687Z Entering 'third_party/pybind11' 2025-09-07T06:38:55.0921254Z Entering 'third_party/python-peachpy' 2025-09-07T06:38:55.0985987Z Entering 'third_party/sleef' 2025-09-07T06:38:55.1063692Z Entering 'third_party/tensorpipe' 2025-09-07T06:38:55.1122619Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:38:55.1186935Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:38:55.1243547Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:38:55.1307888Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:38:55.1376501Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:38:55.1473075Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T06:38:55.1513360Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T06:38:55.1885327Z Entering 'android/libs/fbjni' 2025-09-07T06:38:55.1955827Z Entering 'third_party/FP16' 2025-09-07T06:38:55.2027829Z Entering 'third_party/FXdiv' 2025-09-07T06:38:55.2106318Z Entering 'third_party/NNPACK' 2025-09-07T06:38:55.2175122Z Entering 'third_party/NVTX' 2025-09-07T06:38:55.2245286Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:38:55.2316907Z Entering 'third_party/XNNPACK' 2025-09-07T06:38:55.2405101Z Entering 'third_party/aiter' 2025-09-07T06:38:55.2470337Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:38:55.2552149Z Entering 'third_party/benchmark' 2025-09-07T06:38:55.2626888Z Entering 'third_party/composable_kernel' 2025-09-07T06:38:55.2698675Z Entering 'third_party/cpp-httplib' 2025-09-07T06:38:55.2768208Z Entering 'third_party/cpuinfo' 2025-09-07T06:38:55.2838103Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:38:55.2908799Z Entering 'third_party/cutlass' 2025-09-07T06:38:55.2998371Z Entering 'third_party/fbgemm' 2025-09-07T06:38:55.3072823Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:38:55.3146930Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:38:55.3218425Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:38:55.3283578Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:38:55.3361327Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:38:55.3430002Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:38:55.3485725Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:38:55.3557406Z Entering 'third_party/flash-attention' 2025-09-07T06:38:55.3629046Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:38:55.3698391Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:38:55.3772578Z Entering 'third_party/flatbuffers' 2025-09-07T06:38:55.3848573Z Entering 'third_party/fmt' 2025-09-07T06:38:55.3919210Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:38:55.3991580Z Entering 'third_party/gloo' 2025-09-07T06:38:55.4065926Z Entering 'third_party/googletest' 2025-09-07T06:38:55.4132603Z Entering 'third_party/ideep' 2025-09-07T06:38:55.4201953Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:38:55.4281579Z Entering 'third_party/ittapi' 2025-09-07T06:38:55.4354214Z Entering 'third_party/kineto' 2025-09-07T06:38:55.4428992Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:38:55.4489396Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:38:55.4554865Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:38:55.4621753Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:38:55.4681325Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:38:55.4740540Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:38:55.4807227Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:38:55.4873510Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:38:55.4932985Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:38:55.4997738Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:38:55.5065516Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:38:55.5122765Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:38:55.5188049Z Entering 'third_party/kleidiai' 2025-09-07T06:38:55.5256539Z Entering 'third_party/mimalloc' 2025-09-07T06:38:55.5323853Z Entering 'third_party/nlohmann' 2025-09-07T06:38:55.5398066Z Entering 'third_party/onnx' 2025-09-07T06:38:55.5483912Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:38:55.5559698Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:38:55.5632204Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:38:55.5702073Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:38:55.5764004Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:38:55.5825555Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:38:55.5896963Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:38:55.5957630Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:38:55.6024799Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:38:55.6078253Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:38:55.6148119Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:38:55.6220337Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:38:55.6305814Z Entering 'third_party/pocketfft' 2025-09-07T06:38:55.6383164Z Entering 'third_party/protobuf' 2025-09-07T06:38:55.6449779Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:38:55.6517364Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:38:55.6590011Z Entering 'third_party/psimd' 2025-09-07T06:38:55.6661651Z Entering 'third_party/pthreadpool' 2025-09-07T06:38:55.6725607Z Entering 'third_party/pybind11' 2025-09-07T06:38:55.6795108Z Entering 'third_party/python-peachpy' 2025-09-07T06:38:55.6866120Z Entering 'third_party/sleef' 2025-09-07T06:38:55.6933538Z Entering 'third_party/tensorpipe' 2025-09-07T06:38:55.7002863Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:38:55.7066994Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:38:55.7127013Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:38:55.7196030Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:38:55.7264511Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:38:55.7359400Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T06:38:55.7407027Z ##[endgroup] 2025-09-07T06:38:55.7407431Z ##[group]Fetching the repository 2025-09-07T06:38:55.7414425Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-09-07T06:38:56.3053437Z From https://github.com/pytorch/pytorch 2025-09-07T06:38:56.3054487Z - [deleted] (none) -> origin/export-D79751098 2025-09-07T06:38:56.3825352Z - [deleted] (none) -> origin/gh/XilunWu/162/base 2025-09-07T06:38:56.3826802Z - [deleted] (none) -> origin/gh/XilunWu/162/head 2025-09-07T06:38:56.3828378Z - [deleted] (none) -> origin/gh/XilunWu/162/orig 2025-09-07T06:38:56.3829947Z - [deleted] (none) -> origin/gh/anijain2305/847/base 2025-09-07T06:38:56.3831529Z - [deleted] (none) -> origin/gh/anijain2305/847/head 2025-09-07T06:38:56.3833142Z - [deleted] (none) -> origin/gh/anijain2305/847/orig 2025-09-07T06:38:56.3834755Z - [deleted] (none) -> origin/gh/ankitageorge/18/base 2025-09-07T06:38:56.3836100Z - [deleted] (none) -> origin/gh/ankitageorge/18/head 2025-09-07T06:38:56.3837828Z - [deleted] (none) -> origin/gh/ankitageorge/18/orig 2025-09-07T06:38:56.3839574Z - [deleted] (none) -> origin/gh/ankitageorge/19/base 2025-09-07T06:38:56.3841063Z - [deleted] (none) -> origin/gh/ankitageorge/19/head 2025-09-07T06:38:56.3842656Z - [deleted] (none) -> origin/gh/ankitageorge/19/orig 2025-09-07T06:38:56.3844258Z - [deleted] (none) -> origin/gh/ankitageorge/20/base 2025-09-07T06:38:56.3845872Z - [deleted] (none) -> origin/gh/ankitageorge/20/head 2025-09-07T06:38:56.3847458Z - [deleted] (none) -> origin/gh/ankitageorge/20/orig 2025-09-07T06:38:56.3849476Z - [deleted] (none) -> origin/gh/anshul-si/10/base 2025-09-07T06:38:56.3850713Z - [deleted] (none) -> origin/gh/anshul-si/10/head 2025-09-07T06:38:56.3852313Z - [deleted] (none) -> origin/gh/anshul-si/10/orig 2025-09-07T06:38:56.3853972Z - [deleted] (none) -> origin/gh/anshul-si/11/base 2025-09-07T06:38:56.3855612Z - [deleted] (none) -> origin/gh/anshul-si/11/head 2025-09-07T06:38:56.3857226Z - [deleted] (none) -> origin/gh/anshul-si/11/orig 2025-09-07T06:38:56.3858842Z - [deleted] (none) -> origin/gh/anshul-si/12/base 2025-09-07T06:38:56.3860451Z - [deleted] (none) -> origin/gh/anshul-si/12/head 2025-09-07T06:38:56.3862054Z - [deleted] (none) -> origin/gh/anshul-si/12/orig 2025-09-07T06:38:56.3863640Z - [deleted] (none) -> origin/gh/anshul-si/13/base 2025-09-07T06:38:56.3865288Z - [deleted] (none) -> origin/gh/anshul-si/13/head 2025-09-07T06:38:56.3869181Z - [deleted] (none) -> origin/gh/anshul-si/13/orig 2025-09-07T06:38:56.3869549Z - [deleted] (none) -> origin/gh/anshul-si/14/base 2025-09-07T06:38:56.3870031Z - [deleted] (none) -> origin/gh/anshul-si/14/head 2025-09-07T06:38:56.3871718Z - [deleted] (none) -> origin/gh/anshul-si/14/orig 2025-09-07T06:38:56.3873351Z - [deleted] (none) -> origin/gh/anshul-si/9/base 2025-09-07T06:38:56.3874940Z - [deleted] (none) -> origin/gh/anshul-si/9/head 2025-09-07T06:38:56.3876547Z - [deleted] (none) -> origin/gh/anshul-si/9/orig 2025-09-07T06:38:56.3878164Z - [deleted] (none) -> origin/gh/colinchan15/4/base 2025-09-07T06:38:56.3879754Z - [deleted] (none) -> origin/gh/colinchan15/4/head 2025-09-07T06:38:56.3881381Z - [deleted] (none) -> origin/gh/colinchan15/5/base 2025-09-07T06:38:56.3883053Z - [deleted] (none) -> origin/gh/colinchan15/5/head 2025-09-07T06:38:56.3884637Z - [deleted] (none) -> origin/gh/drisspg/150/base 2025-09-07T06:38:56.3886267Z - [deleted] (none) -> origin/gh/drisspg/150/head 2025-09-07T06:38:56.3887889Z - [deleted] (none) -> origin/gh/drisspg/150/orig 2025-09-07T06:38:56.3889524Z - [deleted] (none) -> origin/gh/drisspg/151/base 2025-09-07T06:38:56.3891156Z - [deleted] (none) -> origin/gh/drisspg/151/head 2025-09-07T06:38:56.3892806Z - [deleted] (none) -> origin/gh/drisspg/151/orig 2025-09-07T06:38:56.3894496Z - [deleted] (none) -> origin/gh/ezyang/3068/base 2025-09-07T06:38:56.3896116Z - [deleted] (none) -> origin/gh/ezyang/3068/head 2025-09-07T06:38:56.3897747Z - [deleted] (none) -> origin/gh/ezyang/3068/orig 2025-09-07T06:38:56.3899381Z - [deleted] (none) -> origin/gh/jiayisunx/57/base 2025-09-07T06:38:56.3901013Z - [deleted] (none) -> origin/gh/jiayisunx/57/head 2025-09-07T06:38:56.3902639Z - [deleted] (none) -> origin/gh/jiayisunx/57/orig 2025-09-07T06:38:56.3904265Z - [deleted] (none) -> origin/gh/malfet/457/base 2025-09-07T06:38:56.3905901Z - [deleted] (none) -> origin/gh/malfet/457/head 2025-09-07T06:38:56.3907548Z - [deleted] (none) -> origin/gh/malfet/457/orig 2025-09-07T06:38:56.3909152Z - [deleted] (none) -> origin/gh/v0i0/2/base 2025-09-07T06:38:56.3910805Z - [deleted] (none) -> origin/gh/v0i0/2/head 2025-09-07T06:38:56.3912409Z - [deleted] (none) -> origin/gh/v0i0/2/orig 2025-09-07T06:38:56.3914061Z - [deleted] (none) -> origin/gh/v0i0/3/base 2025-09-07T06:38:56.3915678Z - [deleted] (none) -> origin/gh/v0i0/3/head 2025-09-07T06:38:56.3917482Z - [deleted] (none) -> origin/gh/v0i0/3/orig 2025-09-07T06:38:56.3918996Z - [deleted] (none) -> origin/gh/wconstab/439/base 2025-09-07T06:38:56.3920569Z - [deleted] (none) -> origin/gh/wconstab/439/head 2025-09-07T06:38:56.3922195Z - [deleted] (none) -> origin/gh/wconstab/439/orig 2025-09-07T06:38:56.3923819Z - [deleted] (none) -> origin/gh/xmfan/275/base 2025-09-07T06:38:56.3925470Z - [deleted] (none) -> origin/gh/xmfan/275/head 2025-09-07T06:38:56.3927079Z - [deleted] (none) -> origin/gh/xmfan/275/orig 2025-09-07T06:38:56.3928735Z - [deleted] (none) -> origin/gh/yangw-dev/1/base 2025-09-07T06:38:56.3930319Z - [deleted] (none) -> origin/gh/yangw-dev/10/base 2025-09-07T06:38:56.3931965Z - [deleted] (none) -> origin/gh/yangw-dev/10/head 2025-09-07T06:38:56.3933603Z - [deleted] (none) -> origin/gh/yangw-dev/10/orig 2025-09-07T06:38:56.3935343Z - [deleted] (none) -> origin/gh/yangw-dev/2/base 2025-09-07T06:38:56.3936930Z - [deleted] (none) -> origin/gh/yangw-dev/2/head 2025-09-07T06:38:56.3938535Z - [deleted] (none) -> origin/gh/yangw-dev/4/base 2025-09-07T06:38:56.3940151Z - [deleted] (none) -> origin/gh/yangw-dev/4/head 2025-09-07T06:38:56.3941800Z - [deleted] (none) -> origin/gh/zou3519/1190/base 2025-09-07T06:38:56.3943434Z - [deleted] (none) -> origin/gh/zou3519/1190/head 2025-09-07T06:38:56.3945065Z - [deleted] (none) -> origin/gh/zou3519/1190/orig 2025-09-07T06:38:56.3946684Z - [deleted] (none) -> origin/malfet-patch-7 2025-09-07T06:38:56.3948367Z - [deleted] (none) -> origin/update-audio-commit-hash/16791960928-1711-1 2025-09-07T06:38:56.3949952Z - [deleted] (none) -> ciflow/h100-symm-mem/161601 2025-09-07T06:38:56.3951584Z - [deleted] (none) -> ciflow/inductor-rocm/161700 2025-09-07T06:38:56.3953206Z - [deleted] (none) -> ciflow/inductor/160467 2025-09-07T06:38:56.3954828Z - [deleted] (none) -> ciflow/inductor/160483 2025-09-07T06:38:56.3956447Z - [deleted] (none) -> ciflow/inductor/161601 2025-09-07T06:38:56.3958090Z - [deleted] (none) -> ciflow/inductor/162167 2025-09-07T06:38:56.3959692Z - [deleted] (none) -> ciflow/inductor/162221 2025-09-07T06:38:56.3961383Z - [deleted] (none) -> ciflow/inductor/162247 2025-09-07T06:38:56.3962972Z - [deleted] (none) -> ciflow/inductor/162285 2025-09-07T06:38:56.3964572Z - [deleted] (none) -> ciflow/inductor/162303 2025-09-07T06:38:56.3966211Z - [deleted] (none) -> ciflow/inductor/162314 2025-09-07T06:38:56.3967864Z - [deleted] (none) -> ciflow/periodic-rocm-mi300/161700 2025-09-07T06:38:56.3969465Z - [deleted] (none) -> ciflow/rocm-mi300/161700 2025-09-07T06:38:56.3971101Z - [deleted] (none) -> ciflow/rocm/161700 2025-09-07T06:38:56.3972724Z - [deleted] (none) -> ciflow/trunk/160467 2025-09-07T06:38:56.3974863Z - [deleted] (none) -> ciflow/trunk/160483 2025-09-07T06:38:56.3978772Z - [deleted] (none) -> ciflow/trunk/160907 2025-09-07T06:38:56.3980365Z - [deleted] (none) -> ciflow/trunk/162167 2025-09-07T06:38:56.3981987Z - [deleted] (none) -> ciflow/trunk/162209 2025-09-07T06:38:56.3983607Z - [deleted] (none) -> ciflow/trunk/162221 2025-09-07T06:38:56.3985252Z - [deleted] (none) -> ciflow/trunk/162247 2025-09-07T06:38:56.3986902Z - [deleted] (none) -> ciflow/trunk/162285 2025-09-07T06:38:56.3988686Z - [deleted] (none) -> ciflow/trunk/162301 2025-09-07T06:38:56.3990164Z - [deleted] (none) -> ciflow/trunk/162314 2025-09-07T06:38:56.3991768Z - [deleted] (none) -> ciflow/trunk/162322 2025-09-07T06:38:56.3993396Z - [deleted] (none) -> ciflow/vllm/162000 2025-09-07T06:38:56.3995044Z - [deleted] (none) -> ciflow/xpu/139971 2025-09-07T06:38:56.3996654Z - [deleted] (none) -> ciflow/xpu/161601 2025-09-07T06:38:56.3998357Z - [deleted] (none) -> trunk/2d31c3d99d9a0b71d6939b0d6961fe6f99838ba9 2025-09-07T06:38:56.3999962Z - [deleted] (none) -> trunk/2e1345a0f8427ecf4eabfc1e3aa1b46787c47467 2025-09-07T06:38:56.4001598Z - [deleted] (none) -> trunk/2fed4fb464d87fe7cc2ff646ec2bb8052e76c729 2025-09-07T06:38:56.4003238Z - [deleted] (none) -> trunk/37da7b777b06e4a0f8e6192dd2a7e9047194fbf3 2025-09-07T06:38:56.4004877Z - [deleted] (none) -> trunk/4d3ab2669b3839b53361ebc5c8d53bcc819b4876 2025-09-07T06:38:56.4006490Z - [deleted] (none) -> trunk/684ae48c160364ea46c77050a7fa24c13a751df2 2025-09-07T06:38:56.4008127Z - [deleted] (none) -> trunk/76f81b56d3f5788d79c4250bae76da8f929ac4ba 2025-09-07T06:38:56.4009742Z - [deleted] (none) -> trunk/77d8e98e1b07797c6730b7ba7c313c984cce4ed3 2025-09-07T06:38:56.4011383Z - [deleted] (none) -> trunk/82d2d23e855007c581b529b43dde397f55f47e43 2025-09-07T06:38:56.4013016Z - [deleted] (none) -> trunk/ad7b748686610e317e5c0cbbd523b7a6e3b8b51f 2025-09-07T06:38:56.4014744Z - [deleted] (none) -> trunk/b93f87d67b874fd4a1c57c89869ae53c4387063c 2025-09-07T06:38:56.4016390Z - [deleted] (none) -> trunk/b994f6e3b331faeac693970bd1e14972f3fc9d4a 2025-09-07T06:38:56.4017991Z - [deleted] (none) -> trunk/c83cbd2f2a2de2e3258f07de77d8740743df6d2d 2025-09-07T06:38:56.4019642Z - [deleted] (none) -> trunk/db622842bc97acc66d1ee31b8ceacd63abea3b55 2025-09-07T06:38:56.4021255Z - [deleted] (none) -> trunk/e015de19695402569e2029429c10508f938b6f05 2025-09-07T06:38:56.4022901Z - [deleted] (none) -> trunk/f3697b033ea44a28caa7bb31cf6357641863f8db 2025-09-07T06:38:56.4024520Z - [deleted] (none) -> trunk/f44ad54bc6edd1b41d9c9b6701c27e3e6e636601 2025-09-07T06:38:56.4026169Z - [deleted] (none) -> trunk/fb2d5ea697a72301d0fb889ead412c6b5ed0d1b8 2025-09-07T06:38:57.6321696Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-09-07T06:38:57.6327395Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-09-07T06:38:57.6328793Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-09-07T06:38:57.6341535Z * [new branch] gh/benjaminglass1/105/base -> origin/gh/benjaminglass1/105/base 2025-09-07T06:38:57.6343048Z * [new branch] gh/benjaminglass1/105/head -> origin/gh/benjaminglass1/105/head 2025-09-07T06:38:57.6344569Z * [new branch] gh/benjaminglass1/105/orig -> origin/gh/benjaminglass1/105/orig 2025-09-07T06:38:57.6346953Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-09-07T06:38:57.6348417Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-09-07T06:38:57.6350005Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-09-07T06:38:57.6375033Z 0a524fd3b98..64a6f0704be gh/kwen2501/231/base -> origin/gh/kwen2501/231/base 2025-09-07T06:38:57.6376875Z 30e0f870366..8f5769ceb79 gh/kwen2501/231/head -> origin/gh/kwen2501/231/head 2025-09-07T06:38:57.6379695Z + e412712e97a...f86c78bdacb gh/kwen2501/231/orig -> origin/gh/kwen2501/231/orig (forced update) 2025-09-07T06:38:57.6380767Z ec5c5d73824..a4ec296b7ee gh/kwen2501/232/base -> origin/gh/kwen2501/232/base 2025-09-07T06:38:57.6382686Z d329963f969..b2d47a2ea60 gh/kwen2501/232/head -> origin/gh/kwen2501/232/head 2025-09-07T06:38:57.6384276Z + f6cad7ad010...80b3a1bca13 gh/kwen2501/232/orig -> origin/gh/kwen2501/232/orig (forced update) 2025-09-07T06:38:57.6389801Z cbf2cc42833..7e2b60cc42b gh/malfet/507/head -> origin/gh/malfet/507/head 2025-09-07T06:38:57.6391691Z + 7a832dd0f9b...b1f24e95856 gh/malfet/507/orig -> origin/gh/malfet/507/orig (forced update) 2025-09-07T06:38:57.6399092Z e689558747f..a6ef630a23f gh/shunting314/215/base -> origin/gh/shunting314/215/base 2025-09-07T06:38:57.6400959Z 370cb6c00fe..5b9ab1de9fb gh/shunting314/215/head -> origin/gh/shunting314/215/head 2025-09-07T06:38:57.6403053Z + b9a6fc6e5a7...1d79f6c58d4 gh/shunting314/215/orig -> origin/gh/shunting314/215/orig (forced update) 2025-09-07T06:38:57.6404878Z a80fe16c4ce..dc5e48a2ee5 gh/shunting314/216/base -> origin/gh/shunting314/216/base 2025-09-07T06:38:57.6406582Z 3d2e3d67ae9..ed4c6cf730d gh/shunting314/216/head -> origin/gh/shunting314/216/head 2025-09-07T06:38:57.6408320Z + 34611cfbb98...df17ce5322d gh/shunting314/216/orig -> origin/gh/shunting314/216/orig (forced update) 2025-09-07T06:38:57.6410170Z 1f335cf061a..517cb6f2c03 gh/shunting314/217/base -> origin/gh/shunting314/217/base 2025-09-07T06:38:57.6411984Z fc86e4872f7..c8e45f3eec0 gh/shunting314/217/head -> origin/gh/shunting314/217/head 2025-09-07T06:38:57.6414061Z + a3fc8a9e201...a2ef80de57e gh/shunting314/217/orig -> origin/gh/shunting314/217/orig (forced update) 2025-09-07T06:38:57.6415943Z e2bd0d1fe6c..69402d51ca8 gh/shunting314/218/base -> origin/gh/shunting314/218/base 2025-09-07T06:38:57.6417673Z 8ebcc129485..5804a59dd90 gh/shunting314/218/head -> origin/gh/shunting314/218/head 2025-09-07T06:38:57.6419420Z + 1f1db7414c1...e50ba9fc63c gh/shunting314/218/orig -> origin/gh/shunting314/218/orig (forced update) 2025-09-07T06:38:57.6421710Z * [new branch] gh/shunting314/223/base -> origin/gh/shunting314/223/base 2025-09-07T06:38:57.6423239Z * [new branch] gh/shunting314/223/head -> origin/gh/shunting314/223/head 2025-09-07T06:38:57.6424613Z * [new branch] gh/shunting314/223/orig -> origin/gh/shunting314/223/orig 2025-09-07T06:38:57.6429908Z 7711fe5a903..9e3582c7b9f gh/swolchok/813/base -> origin/gh/swolchok/813/base 2025-09-07T06:38:57.6431707Z ef0b07d631e..9b3873ffcc8 gh/swolchok/813/head -> origin/gh/swolchok/813/head 2025-09-07T06:38:57.6433632Z + 741b4a68c20...6bbd76556b9 gh/swolchok/813/orig -> origin/gh/swolchok/813/orig (forced update) 2025-09-07T06:38:57.6435515Z 91cf4cf3a77..fd2ef70f405 gh/swolchok/814/base -> origin/gh/swolchok/814/base 2025-09-07T06:38:57.6437268Z 856701342b7..90190d0bacc gh/swolchok/814/head -> origin/gh/swolchok/814/head 2025-09-07T06:38:57.6439096Z + b28b9571e60...39c345eb28c gh/swolchok/814/orig -> origin/gh/swolchok/814/orig (forced update) 2025-09-07T06:38:57.6440944Z 97f877fa547..1f1bfb0ef0c gh/swolchok/815/base -> origin/gh/swolchok/815/base 2025-09-07T06:38:57.6442696Z b2c849ab9b5..f0244b1355e gh/swolchok/815/head -> origin/gh/swolchok/815/head 2025-09-07T06:38:57.6444614Z + b40b73cfc43...70bdc29115a gh/swolchok/815/orig -> origin/gh/swolchok/815/orig (forced update) 2025-09-07T06:38:57.6446573Z ebe377ccc3e..fc0d84afe86 gh/swolchok/817/base -> origin/gh/swolchok/817/base 2025-09-07T06:38:57.6448293Z 157f84c6b56..45947152984 gh/swolchok/817/head -> origin/gh/swolchok/817/head 2025-09-07T06:38:57.6450108Z + 26b4b637a59...b818abf011a gh/swolchok/817/orig -> origin/gh/swolchok/817/orig (forced update) 2025-09-07T06:38:57.6452273Z fa4d97ed851..d545bae1f6d gh/swolchok/818/base -> origin/gh/swolchok/818/base 2025-09-07T06:38:57.6453883Z 43f9288c70e..9fbf9dba50b gh/swolchok/818/head -> origin/gh/swolchok/818/head 2025-09-07T06:38:57.6455772Z + da0f784c885...a25b6c77f49 gh/swolchok/818/orig -> origin/gh/swolchok/818/orig (forced update) 2025-09-07T06:38:57.6457669Z 807901e1775..925f262372f gh/swolchok/820/base -> origin/gh/swolchok/820/base 2025-09-07T06:38:57.6459429Z 6892536f844..24cc0c40e06 gh/swolchok/820/head -> origin/gh/swolchok/820/head 2025-09-07T06:38:57.6461388Z + 3d08cee289b...14a16ad5557 gh/swolchok/820/orig -> origin/gh/swolchok/820/orig (forced update) 2025-09-07T06:38:57.6463272Z 3c9c425235c..f5f291a0d38 gh/swolchok/821/base -> origin/gh/swolchok/821/base 2025-09-07T06:38:57.6464959Z 9974c7d9672..c6efe1ddb98 gh/swolchok/821/head -> origin/gh/swolchok/821/head 2025-09-07T06:38:57.6466797Z + 97939ea078e...3529be1c2ad gh/swolchok/821/orig -> origin/gh/swolchok/821/orig (forced update) 2025-09-07T06:38:57.6468928Z 2111a9cf7af..7664c7b8c23 gh/swolchok/823/base -> origin/gh/swolchok/823/base 2025-09-07T06:38:57.6470797Z 12db8395b2a..c3665e33cf4 gh/swolchok/823/head -> origin/gh/swolchok/823/head 2025-09-07T06:38:57.6474130Z + b40511f84a9...7a66db33035 gh/swolchok/823/orig -> origin/gh/swolchok/823/orig (forced update) 2025-09-07T06:38:57.6476139Z 28a66c948dd..456e1cf733f gh/swolchok/826/base -> origin/gh/swolchok/826/base 2025-09-07T06:38:57.6477876Z 77ac6bdb949..1cccbb5dd20 gh/swolchok/826/head -> origin/gh/swolchok/826/head 2025-09-07T06:38:57.6479671Z + 068a2dace83...8a34c9c4e86 gh/swolchok/826/orig -> origin/gh/swolchok/826/orig (forced update) 2025-09-07T06:38:57.6481518Z dcac5d047b0..65cd085e1cd gh/swolchok/827/base -> origin/gh/swolchok/827/base 2025-09-07T06:38:57.6483245Z 144e44cdd58..81fa1339d4c gh/swolchok/827/head -> origin/gh/swolchok/827/head 2025-09-07T06:38:57.6485019Z + e0177d415bd...d993ccaaa2d gh/swolchok/827/orig -> origin/gh/swolchok/827/orig (forced update) 2025-09-07T06:38:57.6486956Z 54c2ff053fb..fb2be21f9be gh/swolchok/828/base -> origin/gh/swolchok/828/base 2025-09-07T06:38:57.6488644Z 863cbbbc12a..331e1fefc6a gh/swolchok/828/head -> origin/gh/swolchok/828/head 2025-09-07T06:38:57.6490382Z + 9144179d0b2...063f62eb6a5 gh/swolchok/828/orig -> origin/gh/swolchok/828/orig (forced update) 2025-09-07T06:38:57.6492321Z 863cbbbc12a..f4bdd1ca9a2 gh/swolchok/830/base -> origin/gh/swolchok/830/base 2025-09-07T06:38:57.6494058Z f766afd56c4..f3e42518e6d gh/swolchok/830/head -> origin/gh/swolchok/830/head 2025-09-07T06:38:57.6495820Z + 6d6aa10a7ac...5015ce5a781 gh/swolchok/830/orig -> origin/gh/swolchok/830/orig (forced update) 2025-09-07T06:38:57.6498058Z * [new branch] gh/swolchok/831/base -> origin/gh/swolchok/831/base 2025-09-07T06:38:57.6499662Z * [new branch] gh/swolchok/831/head -> origin/gh/swolchok/831/head 2025-09-07T06:38:57.6501075Z * [new branch] gh/swolchok/831/orig -> origin/gh/swolchok/831/orig 2025-09-07T06:38:57.6503180Z * [new branch] gh/swolchok/832/base -> origin/gh/swolchok/832/base 2025-09-07T06:38:57.6504758Z * [new branch] gh/swolchok/832/head -> origin/gh/swolchok/832/head 2025-09-07T06:38:57.6506123Z * [new branch] gh/swolchok/832/orig -> origin/gh/swolchok/832/orig 2025-09-07T06:38:57.6518156Z 174f2faa8c5..636d3aa00f2 install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-09-07T06:38:57.6522464Z 9aedb3cd87b..93fb23d6fae main -> origin/main 2025-09-07T06:38:57.6529590Z + 146a688ae7d...da17d096cc8 update-audio-commit-hash/17507351808-1794-1 -> origin/update-audio-commit-hash/17507351808-1794-1 (forced update) 2025-09-07T06:38:57.6531651Z + cc439d6b7db...3b55f4c020e update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 (forced update) 2025-09-07T06:38:57.6533120Z + 70b69a38616...c30ade1afca update-vision-commit-hash/15336342773-1607-1 -> origin/update-vision-commit-hash/15336342773-1607-1 (forced update) 2025-09-07T06:38:57.6535457Z + 6893497b203...2ace76828ce update-vllm-commit-hash/17507351808-1794-1 -> origin/update-vllm-commit-hash/17507351808-1794-1 (forced update) 2025-09-07T06:38:57.6538629Z t [tag update] ciflow/binaries_wheel/162136 -> ciflow/binaries_wheel/162136 2025-09-07T06:38:57.6540305Z t [tag update] ciflow/h100-symm-mem/162243 -> ciflow/h100-symm-mem/162243 2025-09-07T06:38:57.6541783Z t [tag update] ciflow/h100-symm-mem/162320 -> ciflow/h100-symm-mem/162320 2025-09-07T06:38:57.6543142Z * [new tag] ciflow/inductor-periodic/162227 -> ciflow/inductor-periodic/162227 2025-09-07T06:38:57.6544204Z * [new tag] ciflow/inductor-rocm/154170 -> ciflow/inductor-rocm/154170 2025-09-07T06:38:57.6545955Z t [tag update] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-09-07T06:38:57.6547503Z t [tag update] ciflow/inductor/154694 -> ciflow/inductor/154694 2025-09-07T06:38:57.6549968Z t [tag update] ciflow/inductor/161178 -> ciflow/inductor/161178 2025-09-07T06:38:57.6551869Z t [tag update] ciflow/inductor/161595 -> ciflow/inductor/161595 2025-09-07T06:38:57.6553403Z t [tag update] ciflow/inductor/161596 -> ciflow/inductor/161596 2025-09-07T06:38:57.6554417Z * [new tag] ciflow/inductor/161667 -> ciflow/inductor/161667 2025-09-07T06:38:57.6556155Z t [tag update] ciflow/inductor/161693 -> ciflow/inductor/161693 2025-09-07T06:38:57.6557625Z t [tag update] ciflow/inductor/161695 -> ciflow/inductor/161695 2025-09-07T06:38:57.6559734Z t [tag update] ciflow/inductor/162030 -> ciflow/inductor/162030 2025-09-07T06:38:57.6561468Z t [tag update] ciflow/inductor/162101 -> ciflow/inductor/162101 2025-09-07T06:38:57.6563011Z t [tag update] ciflow/inductor/162102 -> ciflow/inductor/162102 2025-09-07T06:38:57.6564591Z t [tag update] ciflow/inductor/162126 -> ciflow/inductor/162126 2025-09-07T06:38:57.6566386Z t [tag update] ciflow/inductor/162220 -> ciflow/inductor/162220 2025-09-07T06:38:57.6567975Z t [tag update] ciflow/inductor/162227 -> ciflow/inductor/162227 2025-09-07T06:38:57.6569666Z t [tag update] ciflow/inductor/162298 -> ciflow/inductor/162298 2025-09-07T06:38:57.6571295Z t [tag update] ciflow/inductor/162315 -> ciflow/inductor/162315 2025-09-07T06:38:57.6572494Z * [new tag] ciflow/inductor/162341 -> ciflow/inductor/162341 2025-09-07T06:38:57.6573483Z * [new tag] ciflow/inductor/162345 -> ciflow/inductor/162345 2025-09-07T06:38:57.6575439Z * [new tag] ciflow/rocm-mi300/154170 -> ciflow/rocm-mi300/154170 2025-09-07T06:38:57.6577054Z t [tag update] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-09-07T06:38:57.6577986Z * [new tag] ciflow/rocm/154170 -> ciflow/rocm/154170 2025-09-07T06:38:57.6580140Z t [tag update] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-09-07T06:38:57.6581284Z * [new tag] ciflow/trunk/154170 -> ciflow/trunk/154170 2025-09-07T06:38:57.6582761Z t [tag update] ciflow/trunk/154694 -> ciflow/trunk/154694 2025-09-07T06:38:57.6584322Z * [new tag] ciflow/trunk/158846 -> ciflow/trunk/158846 2025-09-07T06:38:57.6586017Z t [tag update] ciflow/trunk/161178 -> ciflow/trunk/161178 2025-09-07T06:38:57.6587689Z t [tag update] ciflow/trunk/161591 -> ciflow/trunk/161591 2025-09-07T06:38:57.6589265Z t [tag update] ciflow/trunk/161595 -> ciflow/trunk/161595 2025-09-07T06:38:57.6590895Z t [tag update] ciflow/trunk/161596 -> ciflow/trunk/161596 2025-09-07T06:38:57.6592210Z t [tag update] ciflow/trunk/161633 -> ciflow/trunk/161633 2025-09-07T06:38:57.6593694Z t [tag update] ciflow/trunk/161634 -> ciflow/trunk/161634 2025-09-07T06:38:57.6594646Z * [new tag] ciflow/trunk/161667 -> ciflow/trunk/161667 2025-09-07T06:38:57.6596290Z t [tag update] ciflow/trunk/161692 -> ciflow/trunk/161692 2025-09-07T06:38:57.6597721Z t [tag update] ciflow/trunk/161693 -> ciflow/trunk/161693 2025-09-07T06:38:57.6599310Z t [tag update] ciflow/trunk/161695 -> ciflow/trunk/161695 2025-09-07T06:38:57.6600998Z * [new tag] ciflow/trunk/162311 -> ciflow/trunk/162311 2025-09-07T06:38:57.6602474Z t [tag update] ciflow/trunk/162315 -> ciflow/trunk/162315 2025-09-07T06:38:57.6603751Z * [new tag] ciflow/trunk/162328 -> ciflow/trunk/162328 2025-09-07T06:38:57.6605849Z * [new tag] trunk/047603d35bdc70046216384838d6340feab79bf4 -> trunk/047603d35bdc70046216384838d6340feab79bf4 2025-09-07T06:38:57.6607090Z * [new tag] trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f -> trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f 2025-09-07T06:38:57.6608233Z * [new tag] trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 -> trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 2025-09-07T06:38:57.6609761Z * [new tag] trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 -> trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 2025-09-07T06:38:57.6610715Z * [new tag] trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c -> trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c 2025-09-07T06:38:57.6612382Z * [new tag] trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 -> trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 2025-09-07T06:38:57.6613398Z * [new tag] trunk/5927a70934ccf7b70182d364c23245a7dd685503 -> trunk/5927a70934ccf7b70182d364c23245a7dd685503 2025-09-07T06:38:57.6614561Z * [new tag] trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 -> trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 2025-09-07T06:38:57.6616747Z * [new tag] trunk/93fb23d6fae7c4e82c4239a1033e522088742634 -> trunk/93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:38:57.6618414Z * [new tag] trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c -> trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c 2025-09-07T06:38:57.6619593Z * [new tag] trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c -> trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c 2025-09-07T06:38:57.6620642Z * [new tag] trunk/b919560c4a7010e2d89facee25586269a994746e -> trunk/b919560c4a7010e2d89facee25586269a994746e 2025-09-07T06:38:57.6622615Z * [new tag] trunk/e3068cdb446adefb5a875616ba37a60235391439 -> trunk/e3068cdb446adefb5a875616ba37a60235391439 2025-09-07T06:38:57.6623685Z * [new tag] trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 -> trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 2025-09-07T06:38:57.6624986Z * [new tag] trunk/fea20775ad96bdca972a1811d7d3372f368614ab -> trunk/fea20775ad96bdca972a1811d7d3372f368614ab 2025-09-07T06:38:57.7606053Z [command]/usr/bin/git rev-parse --verify --quiet 93fb23d6fae7c4e82c4239a1033e522088742634^{object} 2025-09-07T06:38:57.7644803Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:38:57.7651264Z ##[endgroup] 2025-09-07T06:38:57.7651944Z ##[group]Determining the checkout info 2025-09-07T06:38:57.7652708Z ##[endgroup] 2025-09-07T06:38:57.7655863Z [command]/usr/bin/git sparse-checkout disable 2025-09-07T06:38:57.7866673Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-09-07T06:38:57.7907929Z ##[group]Checking out the ref 2025-09-07T06:38:57.7911388Z [command]/usr/bin/git checkout --progress --force 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:38:58.0089009Z Previous HEAD position was 9aedb3cd87b [AOTI-FX] Support registering custom FX backends (#162317) 2025-09-07T06:38:58.0106931Z HEAD is now at 93fb23d6fae Build vLLM nightly wheels (#162000) 2025-09-07T06:38:58.0170289Z ##[endgroup] 2025-09-07T06:38:58.0170887Z ##[group]Setting up auth for fetching submodules 2025-09-07T06:38:58.0178797Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T06:38:58.0230552Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-09-07T06:38:58.0270331Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-09-07T06:38:58.0312501Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-09-07T06:38:58.0349722Z ##[endgroup] 2025-09-07T06:38:58.0350353Z ##[group]Fetching submodules 2025-09-07T06:38:58.0352647Z [command]/usr/bin/git submodule sync --recursive 2025-09-07T06:38:58.0738506Z Synchronizing submodule url for 'android/libs/fbjni' 2025-09-07T06:38:58.0795927Z Synchronizing submodule url for 'third_party/FP16' 2025-09-07T06:38:58.0853616Z Synchronizing submodule url for 'third_party/FXdiv' 2025-09-07T06:38:58.0914533Z Synchronizing submodule url for 'third_party/NNPACK' 2025-09-07T06:38:58.0969436Z Synchronizing submodule url for 'third_party/NVTX' 2025-09-07T06:38:58.1025755Z Synchronizing submodule url for 'third_party/VulkanMemoryAllocator' 2025-09-07T06:38:58.1084090Z Synchronizing submodule url for 'third_party/XNNPACK' 2025-09-07T06:38:58.1164690Z Synchronizing submodule url for 'third_party/aiter' 2025-09-07T06:38:58.1216342Z Synchronizing submodule url for 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:38:58.1283832Z Synchronizing submodule url for 'third_party/benchmark' 2025-09-07T06:38:58.1340864Z Synchronizing submodule url for 'third_party/composable_kernel' 2025-09-07T06:38:58.1409244Z Synchronizing submodule url for 'third_party/cpp-httplib' 2025-09-07T06:38:58.1466317Z Synchronizing submodule url for 'third_party/cpuinfo' 2025-09-07T06:38:58.1523658Z Synchronizing submodule url for 'third_party/cudnn_frontend' 2025-09-07T06:38:58.1579704Z Synchronizing submodule url for 'third_party/cutlass' 2025-09-07T06:38:58.1649745Z Synchronizing submodule url for 'third_party/fbgemm' 2025-09-07T06:38:58.1706022Z Synchronizing submodule url for 'third_party/fbgemm/external/asmjit' 2025-09-07T06:38:58.1757254Z Synchronizing submodule url for 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:38:58.1818393Z Synchronizing submodule url for 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:38:58.1867614Z Synchronizing submodule url for 'third_party/fbgemm/external/cutlass' 2025-09-07T06:38:58.1931315Z Synchronizing submodule url for 'third_party/fbgemm/external/googletest' 2025-09-07T06:38:58.1985197Z Synchronizing submodule url for 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:38:58.2036988Z Synchronizing submodule url for 'third_party/fbgemm/external/json' 2025-09-07T06:38:58.2097826Z Synchronizing submodule url for 'third_party/flash-attention' 2025-09-07T06:38:58.2148401Z Synchronizing submodule url for 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:38:58.2209662Z Synchronizing submodule url for 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:38:58.2282784Z Synchronizing submodule url for 'third_party/flatbuffers' 2025-09-07T06:38:58.2342450Z Synchronizing submodule url for 'third_party/fmt' 2025-09-07T06:38:58.2397770Z Synchronizing submodule url for 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:38:58.2460023Z Synchronizing submodule url for 'third_party/gloo' 2025-09-07T06:38:58.2513205Z Synchronizing submodule url for 'third_party/googletest' 2025-09-07T06:38:58.2566392Z Synchronizing submodule url for 'third_party/ideep' 2025-09-07T06:38:58.2622676Z Synchronizing submodule url for 'third_party/ideep/mkl-dnn' 2025-09-07T06:38:58.2686465Z Synchronizing submodule url for 'third_party/ittapi' 2025-09-07T06:38:58.2747575Z Synchronizing submodule url for 'third_party/kineto' 2025-09-07T06:38:58.2800362Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:38:58.2846898Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:38:58.2901861Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:38:58.2950033Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:38:58.3002641Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:38:58.3047080Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:38:58.3101970Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:38:58.3148447Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:38:58.3201024Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:38:58.3252201Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:38:58.3309769Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:38:58.3360824Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:38:58.3418399Z Synchronizing submodule url for 'third_party/kleidiai' 2025-09-07T06:38:58.3473176Z Synchronizing submodule url for 'third_party/mimalloc' 2025-09-07T06:38:58.3527384Z Synchronizing submodule url for 'third_party/nlohmann' 2025-09-07T06:38:58.3590222Z Synchronizing submodule url for 'third_party/onnx' 2025-09-07T06:38:58.3661123Z Synchronizing submodule url for 'third_party/onnx/third_party/pybind11' 2025-09-07T06:38:58.3720848Z Synchronizing submodule url for 'third_party/opentelemetry-cpp' 2025-09-07T06:38:58.3777376Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:38:58.3828927Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:38:58.3880765Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:38:58.3930163Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:38:58.3984571Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:38:58.4030493Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:38:58.4082378Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:38:58.4127791Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:38:58.4184804Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:38:58.4236213Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:38:58.4310978Z Synchronizing submodule url for 'third_party/pocketfft' 2025-09-07T06:38:58.4367975Z Synchronizing submodule url for 'third_party/protobuf' 2025-09-07T06:38:58.4426946Z Synchronizing submodule url for 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:38:58.4473865Z Synchronizing submodule url for 'third_party/protobuf/third_party/googletest' 2025-09-07T06:38:58.4533292Z Synchronizing submodule url for 'third_party/psimd' 2025-09-07T06:38:58.4591683Z Synchronizing submodule url for 'third_party/pthreadpool' 2025-09-07T06:38:58.4648426Z Synchronizing submodule url for 'third_party/pybind11' 2025-09-07T06:38:58.4707364Z Synchronizing submodule url for 'third_party/python-peachpy' 2025-09-07T06:38:58.4763060Z Synchronizing submodule url for 'third_party/sleef' 2025-09-07T06:38:58.4823448Z Synchronizing submodule url for 'third_party/tensorpipe' 2025-09-07T06:38:58.4873066Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:38:58.4924127Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:38:58.4973148Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:38:58.5025370Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:38:58.5070706Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:38:58.5151911Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-09-07T06:38:58.5881780Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-09-07T06:38:58.6231467Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-09-07T06:38:58.6581269Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-09-07T06:38:58.6941759Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-09-07T06:38:58.7309283Z Submodule path 'third_party/NVTX': checked out '2942f167cc30c5e3a44a2aecd5b0d9c07ff61a07' 2025-09-07T06:38:58.7665963Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-09-07T06:38:58.8225964Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-09-07T06:38:58.8734625Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-09-07T06:38:58.9267701Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-09-07T06:38:58.9669540Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-09-07T06:38:59.0249824Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-09-07T06:38:59.0647627Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-09-07T06:38:59.1007847Z Submodule path 'third_party/cpuinfo': checked out '5e3d2445e6a84d9599bee2bf78edbb4d80865e1d' 2025-09-07T06:38:59.1395629Z Submodule path 'third_party/cudnn_frontend': checked out 'f937055efc6d414d11f4c6577e3977fe74f35fb6' 2025-09-07T06:38:59.1861874Z Submodule path 'third_party/cutlass': checked out 'e51efbfe18fe4f4cbb66ab814c55bf4aa0185491' 2025-09-07T06:38:59.7100449Z From https://github.com/pytorch/fbgemm 2025-09-07T06:38:59.7100955Z dada912f..75e10fd0 gh-pages -> origin/gh-pages 2025-09-07T06:38:59.7103044Z 10367cc8..b027a7cf main -> origin/main 2025-09-07T06:38:59.7104636Z 43ddd069..6502af0b nightly -> origin/nightly 2025-09-07T06:38:59.8157244Z Fetching submodule external/composable_kernel 2025-09-07T06:39:00.8912367Z From https://github.com/jwfromm/composable_kernel 2025-09-07T06:39:00.8913194Z * branch 7fe50dc3da2069d6645d9deb8c017a876472a977 -> FETCH_HEAD 2025-09-07T06:39:00.9655509Z Submodule path 'third_party/fbgemm': checked out '4b39c551efe15e6bbade20565b0ceb2d8ce3352d' 2025-09-07T06:39:00.9980071Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-09-07T06:39:01.0377165Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out 'b1281b8b08d973a7064f864f47eeb30f3e2596e9' 2025-09-07T06:39:01.0719337Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-09-07T06:39:01.2073909Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '311f3c8e51dc0eb56310cfc6980bf63d0fbd7917' 2025-09-07T06:39:01.2426785Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T06:39:01.2756712Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-09-07T06:39:01.3123911Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-09-07T06:39:01.3548201Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-09-07T06:39:01.4049456Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-09-07T06:39:01.4481600Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-09-07T06:39:01.4934264Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-09-07T06:39:01.5305255Z Submodule path 'third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-09-07T06:39:01.5653655Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-09-07T06:39:01.6020419Z Submodule path 'third_party/gloo': checked out 'c7b7b022c124d9643957d9bd55f57ac59fce8fa2' 2025-09-07T06:39:01.6386121Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T06:39:01.6748778Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-09-07T06:39:01.7269469Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-09-07T06:39:01.7660322Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-09-07T06:39:01.8037108Z Submodule path 'third_party/kineto': checked out '5e7501833f1021ce6f618572d3baf657b6319658' 2025-09-07T06:39:01.8392934Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out '7d04a0053a845370ae06ce317a22a48e9edcc74e' 2025-09-07T06:39:01.8737567Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-09-07T06:39:01.9076525Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-09-07T06:39:01.9418944Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-09-07T06:39:01.9740449Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-09-07T06:39:02.0055419Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-09-07T06:39:02.0396906Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-09-07T06:39:02.0734100Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '58d77fa8070e8cec2dc1ed015d66b454c8d78850' 2025-09-07T06:39:02.1125457Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-09-07T06:39:02.1462340Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-09-07T06:39:02.1797868Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '0041a40c1350ba702d475b9c4ad62da77caea164' 2025-09-07T06:39:02.2124875Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2025-09-07T06:39:02.2511748Z Submodule path 'third_party/kleidiai': checked out 'cca02c2f69dd18e1f12647c1c0bdc8cf90e680c7' 2025-09-07T06:39:02.2898623Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-09-07T06:39:02.3305912Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-09-07T06:39:02.3865829Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-09-07T06:39:02.4267100Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-09-07T06:39:02.4702330Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-09-07T06:39:02.5021958Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-09-07T06:39:02.5353265Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-09-07T06:39:02.5649712Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-09-07T06:39:02.6013710Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-09-07T06:39:02.6334086Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-09-07T06:39:02.6646551Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-09-07T06:39:02.6967314Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-09-07T06:39:02.7320958Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-09-07T06:39:02.7663677Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-09-07T06:39:02.8178623Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-09-07T06:39:02.8561698Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-09-07T06:39:02.9163549Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-09-07T06:39:02.9492311Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-09-07T06:39:02.9820358Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-09-07T06:39:03.0165491Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-09-07T06:39:03.0514611Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-09-07T06:39:03.0897825Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-09-07T06:39:03.1236797Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-09-07T06:39:03.1587961Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-09-07T06:39:03.1953235Z Submodule path 'third_party/tensorpipe': checked out 'af0118d13e52f5a08841464a768e01a0bf3e3075' 2025-09-07T06:39:03.2265684Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-09-07T06:39:03.2568965Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-09-07T06:39:03.3037606Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-09-07T06:39:03.3379551Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-09-07T06:39:03.3673472Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-09-07T06:39:03.3810696Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-09-07T06:39:03.4181064Z Entering 'android/libs/fbjni' 2025-09-07T06:39:03.4239193Z Entering 'third_party/FP16' 2025-09-07T06:39:03.4303616Z Entering 'third_party/FXdiv' 2025-09-07T06:39:03.4364489Z Entering 'third_party/NNPACK' 2025-09-07T06:39:03.4425412Z Entering 'third_party/NVTX' 2025-09-07T06:39:03.4491740Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:39:03.4555411Z Entering 'third_party/XNNPACK' 2025-09-07T06:39:03.4639627Z Entering 'third_party/aiter' 2025-09-07T06:39:03.4703060Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:39:03.4771443Z Entering 'third_party/benchmark' 2025-09-07T06:39:03.4832413Z Entering 'third_party/composable_kernel' 2025-09-07T06:39:03.4901587Z Entering 'third_party/cpp-httplib' 2025-09-07T06:39:03.4959653Z Entering 'third_party/cpuinfo' 2025-09-07T06:39:03.5024160Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:39:03.5086036Z Entering 'third_party/cutlass' 2025-09-07T06:39:03.5159962Z Entering 'third_party/fbgemm' 2025-09-07T06:39:03.5225546Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:39:03.5282899Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:39:03.5352533Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:39:03.5411659Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:39:03.5482739Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:39:03.5543523Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:39:03.5598633Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:39:03.5661376Z Entering 'third_party/flash-attention' 2025-09-07T06:39:03.5720206Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:39:03.5784850Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:39:03.5854015Z Entering 'third_party/flatbuffers' 2025-09-07T06:39:03.5921431Z Entering 'third_party/fmt' 2025-09-07T06:39:03.5985303Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:39:03.6043515Z Entering 'third_party/gloo' 2025-09-07T06:39:03.6104517Z Entering 'third_party/googletest' 2025-09-07T06:39:03.6160885Z Entering 'third_party/ideep' 2025-09-07T06:39:03.6223856Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:39:03.6291685Z Entering 'third_party/ittapi' 2025-09-07T06:39:03.6355046Z Entering 'third_party/kineto' 2025-09-07T06:39:03.6413998Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:39:03.6468453Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:39:03.6526538Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:39:03.6585540Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:39:03.6638821Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:39:03.6692321Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:39:03.6752150Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:39:03.6805833Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:39:03.6866721Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:39:03.6923185Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:39:03.6989991Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:39:03.7046967Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:39:03.7108985Z Entering 'third_party/kleidiai' 2025-09-07T06:39:03.7169683Z Entering 'third_party/mimalloc' 2025-09-07T06:39:03.7230981Z Entering 'third_party/nlohmann' 2025-09-07T06:39:03.7297702Z Entering 'third_party/onnx' 2025-09-07T06:39:03.7379688Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:39:03.7445345Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:39:03.7509209Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:39:03.7563487Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:39:03.7622024Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:39:03.7679242Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:39:03.7737097Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:39:03.7792050Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:39:03.7848005Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:39:03.7905301Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:39:03.7963265Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:39:03.8030709Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:39:03.8108057Z Entering 'third_party/pocketfft' 2025-09-07T06:39:03.8165914Z Entering 'third_party/protobuf' 2025-09-07T06:39:03.8231732Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:39:03.8287845Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:39:03.8352155Z Entering 'third_party/psimd' 2025-09-07T06:39:03.8412088Z Entering 'third_party/pthreadpool' 2025-09-07T06:39:03.8472730Z Entering 'third_party/pybind11' 2025-09-07T06:39:03.8532988Z Entering 'third_party/python-peachpy' 2025-09-07T06:39:03.8593141Z Entering 'third_party/sleef' 2025-09-07T06:39:03.8653241Z Entering 'third_party/tensorpipe' 2025-09-07T06:39:03.8713096Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:39:03.8769855Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:39:03.8829244Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:39:03.8881237Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:39:03.8929097Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:39:03.9017216Z ##[endgroup] 2025-09-07T06:39:03.9017825Z ##[group]Persisting credentials for submodules 2025-09-07T06:39:03.9024425Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-09-07T06:39:03.9383289Z Entering 'android/libs/fbjni' 2025-09-07T06:39:03.9424712Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9425206Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9478590Z Entering 'third_party/FP16' 2025-09-07T06:39:03.9516510Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9516936Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9570896Z Entering 'third_party/FXdiv' 2025-09-07T06:39:03.9609590Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9610022Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9664070Z Entering 'third_party/NNPACK' 2025-09-07T06:39:03.9706710Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9707032Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9759637Z Entering 'third_party/NVTX' 2025-09-07T06:39:03.9796191Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9796603Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9851252Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:39:03.9889272Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9889704Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9944173Z Entering 'third_party/XNNPACK' 2025-09-07T06:39:03.9987231Z url.https://github.com/.insteadof 2025-09-07T06:39:03.9987662Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0057734Z Entering 'third_party/aiter' 2025-09-07T06:39:04.0095360Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0095788Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0144802Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:39:04.0179260Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0179682Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0240754Z Entering 'third_party/benchmark' 2025-09-07T06:39:04.0277225Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0277525Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0331168Z Entering 'third_party/composable_kernel' 2025-09-07T06:39:04.0369579Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0369986Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0431958Z Entering 'third_party/cpp-httplib' 2025-09-07T06:39:04.0469317Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0469727Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0521694Z Entering 'third_party/cpuinfo' 2025-09-07T06:39:04.0557527Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0557845Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0607303Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:39:04.0639579Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0640003Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0693187Z Entering 'third_party/cutlass' 2025-09-07T06:39:04.0729072Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0729835Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0795948Z Entering 'third_party/fbgemm' 2025-09-07T06:39:04.0837443Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0837849Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0886097Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:39:04.0918471Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0918903Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0965983Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:39:04.0998509Z url.https://github.com/.insteadof 2025-09-07T06:39:04.0998943Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1056091Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:39:04.1091704Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1092190Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1145263Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:39:04.1180475Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1180950Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1237914Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:39:04.1276231Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1276658Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1325549Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:39:04.1358863Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1359292Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1404088Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:39:04.1438967Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1439223Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1491949Z Entering 'third_party/flash-attention' 2025-09-07T06:39:04.1530539Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1530955Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1584703Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:39:04.1621184Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1621615Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1676039Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:39:04.1711154Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1711576Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1775969Z Entering 'third_party/flatbuffers' 2025-09-07T06:39:04.1814075Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1814494Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1871453Z Entering 'third_party/fmt' 2025-09-07T06:39:04.1912333Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1912748Z url.https://github.com/.insteadof 2025-09-07T06:39:04.1964701Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:39:04.2000797Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2001208Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2049800Z Entering 'third_party/gloo' 2025-09-07T06:39:04.2089421Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2089843Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2145740Z Entering 'third_party/googletest' 2025-09-07T06:39:04.2176781Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2177222Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2229554Z Entering 'third_party/ideep' 2025-09-07T06:39:04.2271812Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2272221Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2321887Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:39:04.2358553Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2358971Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2418786Z Entering 'third_party/ittapi' 2025-09-07T06:39:04.2458557Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2458972Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2511028Z Entering 'third_party/kineto' 2025-09-07T06:39:04.2552911Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2553332Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2604173Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:39:04.2640126Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2640545Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2687066Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:39:04.2721009Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2721428Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2771437Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:39:04.2802401Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2802654Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2855262Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:39:04.2891750Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2892178Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2941675Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:39:04.2974571Z url.https://github.com/.insteadof 2025-09-07T06:39:04.2974986Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3026432Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:39:04.3063683Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3064102Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3115458Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:39:04.3154061Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3154479Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3204736Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:39:04.3240527Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3240796Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3289320Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:39:04.3322472Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3322880Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3371938Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:39:04.3405508Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3405915Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3460970Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:39:04.3495602Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3496026Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3548569Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:39:04.3581040Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3581462Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3634274Z Entering 'third_party/kleidiai' 2025-09-07T06:39:04.3674008Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3674487Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3728032Z Entering 'third_party/mimalloc' 2025-09-07T06:39:04.3763818Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3764242Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3813268Z Entering 'third_party/nlohmann' 2025-09-07T06:39:04.3853193Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3853680Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3910369Z Entering 'third_party/onnx' 2025-09-07T06:39:04.3945773Z url.https://github.com/.insteadof 2025-09-07T06:39:04.3946225Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4010438Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:39:04.4044462Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4044900Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4098926Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:39:04.4135499Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4135926Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4191562Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:39:04.4224845Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4225257Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4272593Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:39:04.4305259Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4305714Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4353254Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:39:04.4387398Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4387706Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4434402Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:39:04.4468533Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4469028Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4516119Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:39:04.4553974Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4554385Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4598762Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:39:04.4634149Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4634580Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4681706Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:39:04.4712656Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4713074Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4758306Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:39:04.4794022Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4794484Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4844445Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:39:04.4880348Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4880825Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4936258Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:39:04.4970404Z url.https://github.com/.insteadof 2025-09-07T06:39:04.4970820Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5048758Z Entering 'third_party/pocketfft' 2025-09-07T06:39:04.5087288Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5087725Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5143009Z Entering 'third_party/protobuf' 2025-09-07T06:39:04.5178660Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5179100Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5229370Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:39:04.5268297Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5268707Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5317839Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:39:04.5351399Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5351819Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5406224Z Entering 'third_party/psimd' 2025-09-07T06:39:04.5447048Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5447500Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5502949Z Entering 'third_party/pthreadpool' 2025-09-07T06:39:04.5534193Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5535032Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5585353Z Entering 'third_party/pybind11' 2025-09-07T06:39:04.5628022Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5628350Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5680577Z Entering 'third_party/python-peachpy' 2025-09-07T06:39:04.5717418Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5717838Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5771418Z Entering 'third_party/sleef' 2025-09-07T06:39:04.5809363Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5809785Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5863682Z Entering 'third_party/tensorpipe' 2025-09-07T06:39:04.5895194Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5895607Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5943938Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:39:04.5979487Z url.https://github.com/.insteadof 2025-09-07T06:39:04.5979896Z url.https://github.com/.insteadof 2025-09-07T06:39:04.6025951Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:39:04.6062010Z url.https://github.com/.insteadof 2025-09-07T06:39:04.6062598Z url.https://github.com/.insteadof 2025-09-07T06:39:04.6111904Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:39:04.6150061Z url.https://github.com/.insteadof 2025-09-07T06:39:04.6150367Z url.https://github.com/.insteadof 2025-09-07T06:39:04.6199503Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:39:04.6235912Z url.https://github.com/.insteadof 2025-09-07T06:39:04.6236366Z url.https://github.com/.insteadof 2025-09-07T06:39:04.6282202Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:39:04.6317326Z url.https://github.com/.insteadof 2025-09-07T06:39:04.6317758Z url.https://github.com/.insteadof 2025-09-07T06:39:04.6391928Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-09-07T06:39:04.6765252Z Entering 'android/libs/fbjni' 2025-09-07T06:39:04.6821483Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-09-07T06:39:04.6853083Z Entering 'third_party/FP16' 2025-09-07T06:39:04.6916006Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-09-07T06:39:04.6949335Z Entering 'third_party/FXdiv' 2025-09-07T06:39:04.7010544Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-09-07T06:39:04.7042146Z Entering 'third_party/NNPACK' 2025-09-07T06:39:04.7100799Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-09-07T06:39:04.7128366Z Entering 'third_party/NVTX' 2025-09-07T06:39:04.7183262Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-09-07T06:39:04.7216254Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:39:04.7275325Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-09-07T06:39:04.7305204Z Entering 'third_party/XNNPACK' 2025-09-07T06:39:04.7355340Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-09-07T06:39:04.7405082Z Entering 'third_party/aiter' 2025-09-07T06:39:04.7462128Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-09-07T06:39:04.7490029Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:39:04.7551001Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-09-07T06:39:04.7594426Z Entering 'third_party/benchmark' 2025-09-07T06:39:04.7653608Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-09-07T06:39:04.7683707Z Entering 'third_party/composable_kernel' 2025-09-07T06:39:04.7741978Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-09-07T06:39:04.7779760Z Entering 'third_party/cpp-httplib' 2025-09-07T06:39:04.7837992Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-09-07T06:39:04.7870129Z Entering 'third_party/cpuinfo' 2025-09-07T06:39:04.7934098Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-09-07T06:39:04.7964683Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:39:04.8022834Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-09-07T06:39:04.8050560Z Entering 'third_party/cutlass' 2025-09-07T06:39:04.8103869Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-09-07T06:39:04.8143844Z Entering 'third_party/fbgemm' 2025-09-07T06:39:04.8200104Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-09-07T06:39:04.8231920Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:39:04.8294597Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-09-07T06:39:04.8323698Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:39:04.8382528Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-09-07T06:39:04.8415728Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:39:04.8474288Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-09-07T06:39:04.8507810Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:39:04.8559642Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-09-07T06:39:04.8600127Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:39:04.8658063Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-09-07T06:39:04.8684882Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:39:04.8743598Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-09-07T06:39:04.8768123Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:39:04.8823526Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-09-07T06:39:04.8854549Z Entering 'third_party/flash-attention' 2025-09-07T06:39:04.8907554Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-09-07T06:39:04.8937859Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:39:04.8995737Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-09-07T06:39:04.9036036Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:39:04.9095281Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-09-07T06:39:04.9139187Z Entering 'third_party/flatbuffers' 2025-09-07T06:39:04.9196582Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-09-07T06:39:04.9234204Z Entering 'third_party/fmt' 2025-09-07T06:39:04.9296125Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-09-07T06:39:04.9327171Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:39:04.9385475Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-09-07T06:39:04.9412964Z Entering 'third_party/gloo' 2025-09-07T06:39:04.9474456Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-09-07T06:39:04.9504936Z Entering 'third_party/googletest' 2025-09-07T06:39:04.9561218Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-09-07T06:39:04.9594097Z Entering 'third_party/ideep' 2025-09-07T06:39:04.9656690Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-09-07T06:39:04.9684743Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:39:04.9744382Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-09-07T06:39:04.9784159Z Entering 'third_party/ittapi' 2025-09-07T06:39:04.9843464Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-09-07T06:39:04.9875110Z Entering 'third_party/kineto' 2025-09-07T06:39:04.9928792Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-09-07T06:39:04.9957397Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:39:05.0015940Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-09-07T06:39:05.0040102Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:39:05.0097226Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-09-07T06:39:05.0124302Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:39:05.0178866Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-09-07T06:39:05.0204131Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:39:05.0258632Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-09-07T06:39:05.0283855Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:39:05.0337412Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-09-07T06:39:05.0360839Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:39:05.0418133Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-09-07T06:39:05.0447388Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:39:05.0507337Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-09-07T06:39:05.0541795Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:39:05.0595463Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-09-07T06:39:05.0625481Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:39:05.0674728Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-09-07T06:39:05.0706008Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:39:05.0768499Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-09-07T06:39:05.0802364Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:39:05.0860593Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-09-07T06:39:05.0890446Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:39:05.0949094Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-09-07T06:39:05.0985398Z Entering 'third_party/kleidiai' 2025-09-07T06:39:05.1035926Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-09-07T06:39:05.1067003Z Entering 'third_party/mimalloc' 2025-09-07T06:39:05.1129084Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-09-07T06:39:05.1159261Z Entering 'third_party/nlohmann' 2025-09-07T06:39:05.1220226Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-09-07T06:39:05.1254034Z Entering 'third_party/onnx' 2025-09-07T06:39:05.1313664Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-09-07T06:39:05.1364044Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:39:05.1420261Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-09-07T06:39:05.1452181Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:39:05.1510563Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-09-07T06:39:05.1545050Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:39:05.1597870Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-09-07T06:39:05.1625060Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:39:05.1677523Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-09-07T06:39:05.1704668Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:39:05.1757163Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-09-07T06:39:05.1783686Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:39:05.1838686Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-09-07T06:39:05.1869557Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:39:05.1932109Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-09-07T06:39:05.1959392Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:39:05.2019004Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-09-07T06:39:05.2044317Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:39:05.2100858Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-09-07T06:39:05.2124055Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:39:05.2179804Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-09-07T06:39:05.2209560Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:39:05.2270523Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-09-07T06:39:05.2303718Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:39:05.2357747Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-09-07T06:39:05.2413279Z Entering 'third_party/pocketfft' 2025-09-07T06:39:05.2472166Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-09-07T06:39:05.2503154Z Entering 'third_party/protobuf' 2025-09-07T06:39:05.2558551Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-09-07T06:39:05.2592351Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:39:05.2652145Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-09-07T06:39:05.2681734Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:39:05.2740674Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-09-07T06:39:05.2771363Z Entering 'third_party/psimd' 2025-09-07T06:39:05.2833745Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-09-07T06:39:05.2864282Z Entering 'third_party/pthreadpool' 2025-09-07T06:39:05.2919288Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-09-07T06:39:05.2948841Z Entering 'third_party/pybind11' 2025-09-07T06:39:05.3012553Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-09-07T06:39:05.3044335Z Entering 'third_party/python-peachpy' 2025-09-07T06:39:05.3102280Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-09-07T06:39:05.3129695Z Entering 'third_party/sleef' 2025-09-07T06:39:05.3183654Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-09-07T06:39:05.3216365Z Entering 'third_party/tensorpipe' 2025-09-07T06:39:05.3275766Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-09-07T06:39:05.3307369Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:39:05.3359159Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-09-07T06:39:05.3389597Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:39:05.3438670Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-09-07T06:39:05.3468033Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:39:05.3518941Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-09-07T06:39:05.3549658Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:39:05.3597693Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-09-07T06:39:05.3625950Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:39:05.3677045Z file:/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-09-07T06:39:05.3983273Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-09-07T06:39:05.4348845Z Entering 'android/libs/fbjni' 2025-09-07T06:39:05.4407726Z Entering 'third_party/FP16' 2025-09-07T06:39:05.4470840Z Entering 'third_party/FXdiv' 2025-09-07T06:39:05.4526930Z Entering 'third_party/NNPACK' 2025-09-07T06:39:05.4591262Z Entering 'third_party/NVTX' 2025-09-07T06:39:05.4648389Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:39:05.4713137Z Entering 'third_party/XNNPACK' 2025-09-07T06:39:05.4792535Z Entering 'third_party/aiter' 2025-09-07T06:39:05.4854184Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:39:05.4927307Z Entering 'third_party/benchmark' 2025-09-07T06:39:05.4992113Z Entering 'third_party/composable_kernel' 2025-09-07T06:39:05.5062922Z Entering 'third_party/cpp-httplib' 2025-09-07T06:39:05.5122034Z Entering 'third_party/cpuinfo' 2025-09-07T06:39:05.5180637Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:39:05.5237017Z Entering 'third_party/cutlass' 2025-09-07T06:39:05.5313781Z Entering 'third_party/fbgemm' 2025-09-07T06:39:05.5382542Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:39:05.5435577Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:39:05.5503642Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:39:05.5557837Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:39:05.5631103Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:39:05.5684051Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:39:05.5743079Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:39:05.5803022Z Entering 'third_party/flash-attention' 2025-09-07T06:39:05.5865001Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:39:05.5929097Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:39:05.6000446Z Entering 'third_party/flatbuffers' 2025-09-07T06:39:05.6066317Z Entering 'third_party/fmt' 2025-09-07T06:39:05.6126368Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:39:05.6187075Z Entering 'third_party/gloo' 2025-09-07T06:39:05.6243913Z Entering 'third_party/googletest' 2025-09-07T06:39:05.6304528Z Entering 'third_party/ideep' 2025-09-07T06:39:05.6358046Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:39:05.6424628Z Entering 'third_party/ittapi' 2025-09-07T06:39:05.6480589Z Entering 'third_party/kineto' 2025-09-07T06:39:05.6542496Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:39:05.6596774Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:39:05.6659845Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:39:05.6715490Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:39:05.6772879Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:39:05.6826493Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:39:05.6882864Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:39:05.6944702Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:39:05.6998017Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:39:05.7055051Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:39:05.7118455Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:39:05.7176654Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:39:05.7233607Z Entering 'third_party/kleidiai' 2025-09-07T06:39:05.7291519Z Entering 'third_party/mimalloc' 2025-09-07T06:39:05.7365474Z Entering 'third_party/nlohmann' 2025-09-07T06:39:05.7428164Z Entering 'third_party/onnx' 2025-09-07T06:39:05.7505614Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:39:05.7565619Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:39:05.7627414Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:39:05.7681074Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:39:05.7740730Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:39:05.7797405Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:39:05.7855843Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:39:05.7910013Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:39:05.7963000Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:39:05.8023909Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:39:05.8082133Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:39:05.8143676Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:39:05.8225103Z Entering 'third_party/pocketfft' 2025-09-07T06:39:05.8285869Z Entering 'third_party/protobuf' 2025-09-07T06:39:05.8350313Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:39:05.8407586Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:39:05.8469605Z Entering 'third_party/psimd' 2025-09-07T06:39:05.8529681Z Entering 'third_party/pthreadpool' 2025-09-07T06:39:05.8589133Z Entering 'third_party/pybind11' 2025-09-07T06:39:05.8650583Z Entering 'third_party/python-peachpy' 2025-09-07T06:39:05.8725533Z Entering 'third_party/sleef' 2025-09-07T06:39:05.8786026Z Entering 'third_party/tensorpipe' 2025-09-07T06:39:05.8843457Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:39:05.8906675Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:39:05.8962104Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:39:05.9012215Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:39:05.9066313Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:39:05.9153206Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-09-07T06:39:05.9520440Z Entering 'android/libs/fbjni' 2025-09-07T06:39:05.9584234Z Entering 'third_party/FP16' 2025-09-07T06:39:05.9641834Z Entering 'third_party/FXdiv' 2025-09-07T06:39:05.9698894Z Entering 'third_party/NNPACK' 2025-09-07T06:39:05.9762906Z Entering 'third_party/NVTX' 2025-09-07T06:39:05.9825396Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:39:05.9885933Z Entering 'third_party/XNNPACK' 2025-09-07T06:39:05.9964424Z Entering 'third_party/aiter' 2025-09-07T06:39:06.0030666Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:39:06.0099041Z Entering 'third_party/benchmark' 2025-09-07T06:39:06.0160884Z Entering 'third_party/composable_kernel' 2025-09-07T06:39:06.0233165Z Entering 'third_party/cpp-httplib' 2025-09-07T06:39:06.0293304Z Entering 'third_party/cpuinfo' 2025-09-07T06:39:06.0357009Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:39:06.0421621Z Entering 'third_party/cutlass' 2025-09-07T06:39:06.0488364Z Entering 'third_party/fbgemm' 2025-09-07T06:39:06.0548709Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:39:06.0601374Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:39:06.0669474Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:39:06.0724478Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:39:06.0790917Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:39:06.0848394Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:39:06.0904671Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:39:06.0967708Z Entering 'third_party/flash-attention' 2025-09-07T06:39:06.1031428Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:39:06.1089472Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:39:06.1159384Z Entering 'third_party/flatbuffers' 2025-09-07T06:39:06.1226775Z Entering 'third_party/fmt' 2025-09-07T06:39:06.1286147Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:39:06.1346635Z Entering 'third_party/gloo' 2025-09-07T06:39:06.1403800Z Entering 'third_party/googletest' 2025-09-07T06:39:06.1464703Z Entering 'third_party/ideep' 2025-09-07T06:39:06.1522516Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:39:06.1594932Z Entering 'third_party/ittapi' 2025-09-07T06:39:06.1662973Z Entering 'third_party/kineto' 2025-09-07T06:39:06.1719555Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:39:06.1781717Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:39:06.1839262Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:39:06.1903418Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:39:06.1959329Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:39:06.2021356Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:39:06.2082034Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:39:06.2143919Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:39:06.2197127Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:39:06.2253779Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:39:06.2319007Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:39:06.2383913Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:39:06.2441770Z Entering 'third_party/kleidiai' 2025-09-07T06:39:06.2506370Z Entering 'third_party/mimalloc' 2025-09-07T06:39:06.2567518Z Entering 'third_party/nlohmann' 2025-09-07T06:39:06.2629031Z Entering 'third_party/onnx' 2025-09-07T06:39:06.2711164Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:39:06.2772502Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:39:06.2831688Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:39:06.2884718Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:39:06.2945320Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:39:06.2997448Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:39:06.3048835Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:39:06.3105065Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:39:06.3156999Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:39:06.3211801Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:39:06.3270911Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:39:06.3332278Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:39:06.3412981Z Entering 'third_party/pocketfft' 2025-09-07T06:39:06.3471286Z Entering 'third_party/protobuf' 2025-09-07T06:39:06.3537460Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:39:06.3596035Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:39:06.3652356Z Entering 'third_party/psimd' 2025-09-07T06:39:06.3711519Z Entering 'third_party/pthreadpool' 2025-09-07T06:39:06.3775330Z Entering 'third_party/pybind11' 2025-09-07T06:39:06.3837719Z Entering 'third_party/python-peachpy' 2025-09-07T06:39:06.3906795Z Entering 'third_party/sleef' 2025-09-07T06:39:06.3964488Z Entering 'third_party/tensorpipe' 2025-09-07T06:39:06.4027130Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:39:06.4081066Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:39:06.4141329Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:39:06.4197592Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:39:06.4254975Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:39:06.4339398Z ##[endgroup] 2025-09-07T06:39:06.4391783Z [command]/usr/bin/git log -1 --format=%H 2025-09-07T06:39:06.4427940Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:39:06.4557754Z ##[group]Run cd "${GITHUB_WORKSPACE}" 2025-09-07T06:39:06.4558076Z cd "${GITHUB_WORKSPACE}" 2025-09-07T06:39:06.4558341Z # Clean stale submodule dirs 2025-09-07T06:39:06.4558600Z if [ -z "${NO_SUDO}" ]; then 2025-09-07T06:39:06.4558952Z  sudo git submodule foreach --recursive git clean -ffdx 2025-09-07T06:39:06.4559262Z else 2025-09-07T06:39:06.4559509Z  git submodule foreach --recursive git clean -ffdx 2025-09-07T06:39:06.4559798Z fi 2025-09-07T06:39:06.4598237Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:06.4598570Z env: 2025-09-07T06:39:06.4598798Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:06.4599024Z NO_SUDO: true 2025-09-07T06:39:06.4599210Z ##[endgroup] 2025-09-07T06:39:06.5032735Z Entering 'android/libs/fbjni' 2025-09-07T06:39:06.5086872Z Entering 'third_party/FP16' 2025-09-07T06:39:06.5141151Z Entering 'third_party/FXdiv' 2025-09-07T06:39:06.5191529Z Entering 'third_party/NNPACK' 2025-09-07T06:39:06.5246725Z Entering 'third_party/NVTX' 2025-09-07T06:39:06.5311740Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:39:06.5366631Z Entering 'third_party/XNNPACK' 2025-09-07T06:39:06.5569465Z Entering 'third_party/aiter' 2025-09-07T06:39:06.5637666Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:39:06.5805007Z Entering 'third_party/benchmark' 2025-09-07T06:39:06.5861990Z Entering 'third_party/composable_kernel' 2025-09-07T06:39:06.6032898Z Entering 'third_party/cpp-httplib' 2025-09-07T06:39:06.6091550Z Entering 'third_party/cpuinfo' 2025-09-07T06:39:06.6152964Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:39:06.6205198Z Entering 'third_party/cutlass' 2025-09-07T06:39:06.6362941Z Entering 'third_party/fbgemm' 2025-09-07T06:39:06.6463844Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:39:06.6512859Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:39:06.6673598Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:39:06.6725374Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:39:06.6887796Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:39:06.6943727Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:39:06.6986663Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:39:06.7059643Z Entering 'third_party/flash-attention' 2025-09-07T06:39:06.7120936Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:39:06.7269579Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:39:06.7414420Z Entering 'third_party/flatbuffers' 2025-09-07T06:39:06.7518157Z Entering 'third_party/fmt' 2025-09-07T06:39:06.7574133Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:39:06.7628373Z Entering 'third_party/gloo' 2025-09-07T06:39:06.7686074Z Entering 'third_party/googletest' 2025-09-07T06:39:06.7746314Z Entering 'third_party/ideep' 2025-09-07T06:39:06.7797794Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:39:06.7932803Z Entering 'third_party/ittapi' 2025-09-07T06:39:06.7988835Z Entering 'third_party/kineto' 2025-09-07T06:39:06.8042958Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:39:06.8096682Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:39:06.8166486Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:39:06.8216131Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:39:06.8267348Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:39:06.8307559Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:39:06.8354557Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:39:06.8401056Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:39:06.8456397Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:39:06.8519852Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:39:06.8572449Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:39:06.8624874Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:39:06.8680917Z Entering 'third_party/kleidiai' 2025-09-07T06:39:06.8742003Z Entering 'third_party/mimalloc' 2025-09-07T06:39:06.8796449Z Entering 'third_party/nlohmann' 2025-09-07T06:39:06.8870898Z Entering 'third_party/onnx' 2025-09-07T06:39:06.9358383Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:39:06.9413685Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:39:06.9500344Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:39:06.9548842Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:39:06.9598809Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:39:06.9643033Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:39:06.9711045Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:39:06.9761493Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:39:06.9809017Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:39:06.9857843Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:39:06.9928551Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:39:06.9985648Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:39:07.0373741Z Entering 'third_party/pocketfft' 2025-09-07T06:39:07.0424416Z Entering 'third_party/protobuf' 2025-09-07T06:39:07.0542525Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:39:07.0592976Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:39:07.0656793Z Entering 'third_party/psimd' 2025-09-07T06:39:07.0709453Z Entering 'third_party/pthreadpool' 2025-09-07T06:39:07.0759948Z Entering 'third_party/pybind11' 2025-09-07T06:39:07.0815614Z Entering 'third_party/python-peachpy' 2025-09-07T06:39:07.0864686Z Entering 'third_party/sleef' 2025-09-07T06:39:07.0918641Z Entering 'third_party/tensorpipe' 2025-09-07T06:39:07.0974111Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:39:07.1023665Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:39:07.1070323Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:39:07.1122917Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:39:07.1170120Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:39:07.1352748Z Prepare all required actions 2025-09-07T06:39:07.1353268Z Getting action download info 2025-09-07T06:39:07.2708811Z ##[group]Run ./.github/actions/setup-rocm 2025-09-07T06:39:07.2709091Z env: 2025-09-07T06:39:07.2709269Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:07.2709497Z ##[endgroup] 2025-09-07T06:39:07.2735129Z ##[group]Run dpkg -l | grep -E " rocm" 2025-09-07T06:39:07.2735418Z dpkg -l | grep -E " rocm" 2025-09-07T06:39:07.2776123Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:07.2776456Z env: 2025-09-07T06:39:07.2776641Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:07.2776858Z ##[endgroup] 2025-09-07T06:39:07.3014404Z ii rocm 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) software stack meta package 2025-09-07T06:39:07.3015333Z ii rocm-cmake 0.14.0.60303-74~22.04 amd64 rocm-cmake built using CMake 2025-09-07T06:39:07.3017785Z ii rocm-core 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3018721Z ii rocm-dbgapi 0.77.0.60303-74~22.04 amd64 Library to provide AMD GPU debugger API 2025-09-07T06:39:07.3019712Z ii rocm-debug-agent 2.0.3.60303-74~22.04 amd64 Radeon Open Compute Debug Agent (ROCdebug-agent) 2025-09-07T06:39:07.3020888Z ii rocm-developer-tools 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3022000Z ii rocm-device-libs 1.0.0.60303-74~22.04 amd64 Radeon Open Compute - device libraries 2025-09-07T06:39:07.3022941Z ii rocm-gdb 15.2.60303-74~22.04 amd64 ROCgdb 2025-09-07T06:39:07.3023956Z ii rocm-hip-libraries 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3025196Z ii rocm-hip-runtime 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3026001Z ii rocm-hip-runtime-dev 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3026648Z ii rocm-hip-sdk 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3027218Z ii rocm-language-runtime 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3027756Z ii rocm-llvm 18.0.0.25012.60303-74~22.04 amd64 ROCm core compiler 2025-09-07T06:39:07.3028285Z ii rocm-ml-libraries 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3028825Z ii rocm-ml-sdk 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3029314Z ii rocm-opencl 2.0.0.60303-74~22.04 amd64 clr built using CMake 2025-09-07T06:39:07.3029773Z ii rocm-opencl-dev 2.0.0.60303-74~22.04 amd64 clr built using CMake 2025-09-07T06:39:07.3030314Z ii rocm-opencl-runtime 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3030899Z ii rocm-opencl-sdk 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3031773Z ii rocm-openmp-sdk 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) OpenMP Software development Kit. 2025-09-07T06:39:07.3032364Z ii rocm-smi-lib 7.4.0.60303-74~22.04 amd64 AMD System Management libraries 2025-09-07T06:39:07.3032875Z ii rocm-utils 6.3.3.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:39:07.3033404Z ii rocminfo 1.0.0.60303-74~22.04 amd64 Radeon Open Compute (ROCm) Runtime rocminfo tool 2025-09-07T06:39:07.3054121Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T06:39:07.3054627Z # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T06:39:07.3054976Z # shellcheck disable=SC2046 2025-09-07T06:39:07.3055254Z docker stop $(docker ps -q) || true 2025-09-07T06:39:07.3055550Z # Prune all stopped containers. 2025-09-07T06:39:07.3056010Z docker container prune -f 2025-09-07T06:39:07.3095543Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:07.3095865Z env: 2025-09-07T06:39:07.3096042Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:07.3096267Z ##[endgroup] 2025-09-07T06:39:18.2729504Z 5eb6689f82a8 2025-09-07T06:39:31.2417547Z Deleted Containers: 2025-09-07T06:39:31.2418179Z 5eb6689f82a84536d53b322aa49a0ff27a6b922c18094d15d1771d4b5645dd8e 2025-09-07T06:39:31.2418701Z 2025-09-07T06:39:31.2418861Z Total reclaimed space: 11.34GB 2025-09-07T06:39:31.2487175Z ##[group]Run cat /etc/os-release || true 2025-09-07T06:39:31.2487529Z cat /etc/os-release || true 2025-09-07T06:39:31.2487827Z cat /etc/apt/sources.list.d/rocm.list || true 2025-09-07T06:39:31.2488142Z cat /opt/rocm/.info/version || true 2025-09-07T06:39:31.2488402Z whoami 2025-09-07T06:39:31.2526956Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:31.2527328Z env: 2025-09-07T06:39:31.2527514Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:31.2527740Z ##[endgroup] 2025-09-07T06:39:31.2604558Z PRETTY_NAME="Ubuntu 22.04.4 LTS" 2025-09-07T06:39:31.2605013Z NAME="Ubuntu" 2025-09-07T06:39:31.2605325Z VERSION_ID="22.04" 2025-09-07T06:39:31.2605660Z VERSION="22.04.4 LTS (Jammy Jellyfish)" 2025-09-07T06:39:31.2606106Z VERSION_CODENAME=jammy 2025-09-07T06:39:31.2606427Z ID=ubuntu 2025-09-07T06:39:31.2606714Z ID_LIKE=debian 2025-09-07T06:39:31.2607079Z HOME_URL="https://www.ubuntu.com/" 2025-09-07T06:39:31.2607540Z SUPPORT_URL="https://help.ubuntu.com/" 2025-09-07T06:39:31.2608086Z BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" 2025-09-07T06:39:31.2608850Z PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" 2025-09-07T06:39:31.2609536Z UBUNTU_CODENAME=jammy 2025-09-07T06:39:31.2626498Z deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.3.3 jammy main 2025-09-07T06:39:31.2638820Z 6.3.3-74 2025-09-07T06:39:31.2668781Z pytorchci 2025-09-07T06:39:31.2702976Z ##[group]Run dpkg -l | grep -E " amdgpu" 2025-09-07T06:39:31.2703435Z dpkg -l | grep -E " amdgpu" 2025-09-07T06:39:31.2744649Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:31.2745104Z env: 2025-09-07T06:39:31.2745435Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:31.2745815Z ##[endgroup] 2025-09-07T06:39:31.2990440Z ii amdgpu-core 1:6.3.60303-2119913.22.04 all Core meta package for unified amdgpu driver. 2025-09-07T06:39:31.2991288Z ii amdgpu-dkms 1:6.10.5.60303-2119913.22.04 all amdgpu driver in DKMS format. 2025-09-07T06:39:31.2991975Z ii amdgpu-dkms-firmware 1:6.10.5.60303-2119913.22.04 all firmware blobs used by amdgpu driver in DKMS format 2025-09-07T06:39:31.2993050Z ii amdgpu-install 6.3.60303-2119913.22.04 all AMDGPU driver repository and installer 2025-09-07T06:39:31.3027320Z ##[group]Run rocm-smi 2025-09-07T06:39:31.3027644Z rocm-smi 2025-09-07T06:39:31.3065734Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:31.3066302Z env: 2025-09-07T06:39:31.3066694Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:31.3067150Z ##[endgroup] 2025-09-07T06:39:31.4616224Z 2025-09-07T06:39:31.4616254Z 2025-09-07T06:39:31.4617081Z ========================================= ROCm System Management Interface ========================================= 2025-09-07T06:39:31.4618123Z =================================================== Concise Info =================================================== 2025-09-07T06:39:31.4619446Z Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% 2025-09-07T06:39:31.4620979Z  (DID, GUID) (Edge) (Avg) (Mem, Compute, ID)  2025-09-07T06:39:31.4622437Z ==================================================================================================================== 2025-09-07T06:39:31.4623855Z 0 4 0x740c, 57586 35.0°C 94.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2025-09-07T06:39:31.4625058Z 1 5 0x740c, 45873 34.0°C N/A N/A, N/A, 0 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2025-09-07T06:39:31.4626084Z 2 2 0x740c, 51627 34.0°C 96.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2025-09-07T06:39:31.4627314Z 3 3 0x740c, 64489 30.0°C N/A N/A, N/A, 0 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2025-09-07T06:39:31.4628439Z 4 8 0x740c, 30939 35.0°C 95.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2025-09-07T06:39:31.4629479Z 5 9 0x740c, 8466 24.0°C N/A N/A, N/A, 0 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2025-09-07T06:39:31.4630254Z 6 6 0x740c, 41154 34.0°C 89.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2025-09-07T06:39:31.4630892Z 7 7 0x740c, 63755 31.0°C N/A N/A, N/A, 0 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2025-09-07T06:39:31.4632521Z ==================================================================================================================== 2025-09-07T06:39:31.4633065Z =============================================== End of ROCm SMI Log ================================================ 2025-09-07T06:39:31.4779061Z ##[group]Run rocminfo 2025-09-07T06:39:31.4779471Z rocminfo 2025-09-07T06:39:31.4813026Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:31.4813514Z env: 2025-09-07T06:39:31.4813939Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:31.4814233Z ##[endgroup] 2025-09-07T06:39:31.6126461Z ROCk module version 6.10.5 is loaded 2025-09-07T06:39:31.6126855Z ===================== 2025-09-07T06:39:31.6127212Z HSA System Attributes 2025-09-07T06:39:31.6127500Z ===================== 2025-09-07T06:39:31.6127799Z Runtime Version: 1.14 2025-09-07T06:39:31.6128120Z Runtime Ext Version: 1.6 2025-09-07T06:39:31.6128516Z System Timestamp Freq.: 1000.000000MHz 2025-09-07T06:39:31.6129086Z Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 2025-09-07T06:39:31.6129738Z Machine Model: LARGE 2025-09-07T06:39:31.6130233Z System Endianness: LITTLE 2025-09-07T06:39:31.6130675Z Mwaitx: DISABLED 2025-09-07T06:39:31.6131012Z DMAbuf Support: YES 2025-09-07T06:39:31.6131230Z 2025-09-07T06:39:31.6131331Z ========== 2025-09-07T06:39:31.6131644Z HSA Agents 2025-09-07T06:39:31.6131933Z ========== 2025-09-07T06:39:31.6132237Z ******* 2025-09-07T06:39:31.6132516Z Agent 1 2025-09-07T06:39:31.6133082Z ******* 2025-09-07T06:39:31.6133401Z Name: AMD EPYC 7713 64-Core Processor 2025-09-07T06:39:31.6133950Z Uuid: CPU-XX 2025-09-07T06:39:31.6134382Z Marketing Name: AMD EPYC 7713 64-Core Processor 2025-09-07T06:39:31.6134843Z Vendor Name: CPU 2025-09-07T06:39:31.6135262Z Feature: None specified 2025-09-07T06:39:31.6135666Z Profile: FULL_PROFILE 2025-09-07T06:39:31.6136093Z Float Round Mode: NEAR 2025-09-07T06:39:31.6136515Z Max Queue Number: 0(0x0) 2025-09-07T06:39:31.6136941Z Queue Min Size: 0(0x0) 2025-09-07T06:39:31.6137353Z Queue Max Size: 0(0x0) 2025-09-07T06:39:31.6137765Z Queue Type: MULTI 2025-09-07T06:39:31.6138327Z Node: 0 2025-09-07T06:39:31.6138708Z Device Type: CPU 2025-09-07T06:39:31.6139076Z Cache Info: 2025-09-07T06:39:31.6139367Z L1: 32768(0x8000) KB 2025-09-07T06:39:31.6139784Z Chip ID: 0(0x0) 2025-09-07T06:39:31.6140189Z ASIC Revision: 0(0x0) 2025-09-07T06:39:31.6140616Z Cacheline Size: 64(0x40) 2025-09-07T06:39:31.6141047Z Max Clock Freq. (MHz): 2000 2025-09-07T06:39:31.6141440Z BDFID: 0 2025-09-07T06:39:31.6141864Z Internal Node ID: 0 2025-09-07T06:39:31.6142302Z Compute Unit: 64 2025-09-07T06:39:31.6142741Z SIMDs per CU: 0 2025-09-07T06:39:31.6143076Z Shader Engines: 0 2025-09-07T06:39:31.6143453Z Shader Arrs. per Eng.: 0 2025-09-07T06:39:31.6143821Z WatchPts on Addr. Ranges:1 2025-09-07T06:39:31.6144122Z Memory Properties: 2025-09-07T06:39:31.6144348Z Features: None 2025-09-07T06:39:31.6144564Z Pool Info: 2025-09-07T06:39:31.6144782Z Pool 1 2025-09-07T06:39:31.6145074Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:39:31.6145405Z Size: 528249852(0x1f7c73fc) KB 2025-09-07T06:39:31.6145717Z Allocatable: TRUE 2025-09-07T06:39:31.6146045Z Alloc Granule: 4KB 2025-09-07T06:39:31.6146394Z Alloc Recommended Granule:4KB 2025-09-07T06:39:31.6146740Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6147081Z Accessible by all: TRUE 2025-09-07T06:39:31.6147361Z Pool 2 2025-09-07T06:39:31.6147658Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:39:31.6147988Z Size: 528249852(0x1f7c73fc) KB 2025-09-07T06:39:31.6148320Z Allocatable: TRUE 2025-09-07T06:39:31.6148671Z Alloc Granule: 4KB 2025-09-07T06:39:31.6149018Z Alloc Recommended Granule:4KB 2025-09-07T06:39:31.6149370Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6149698Z Accessible by all: TRUE 2025-09-07T06:39:31.6149991Z Pool 3 2025-09-07T06:39:31.6150424Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-09-07T06:39:31.6150748Z Size: 528249852(0x1f7c73fc) KB 2025-09-07T06:39:31.6151063Z Allocatable: TRUE 2025-09-07T06:39:31.6151391Z Alloc Granule: 4KB 2025-09-07T06:39:31.6151751Z Alloc Recommended Granule:4KB 2025-09-07T06:39:31.6152098Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6152442Z Accessible by all: TRUE 2025-09-07T06:39:31.6152734Z Pool 4 2025-09-07T06:39:31.6153004Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:39:31.6153312Z Size: 528249852(0x1f7c73fc) KB 2025-09-07T06:39:31.6153608Z Allocatable: TRUE 2025-09-07T06:39:31.6154063Z Alloc Granule: 4KB 2025-09-07T06:39:31.6154396Z Alloc Recommended Granule:4KB 2025-09-07T06:39:31.6154771Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6155093Z Accessible by all: TRUE 2025-09-07T06:39:31.6155382Z ISA Info: 2025-09-07T06:39:31.6155581Z ******* 2025-09-07T06:39:31.6155782Z Agent 2 2025-09-07T06:39:31.6155975Z ******* 2025-09-07T06:39:31.6156206Z Name: AMD EPYC 7713 64-Core Processor 2025-09-07T06:39:31.6156511Z Uuid: CPU-XX 2025-09-07T06:39:31.6156831Z Marketing Name: AMD EPYC 7713 64-Core Processor 2025-09-07T06:39:31.6157166Z Vendor Name: CPU 2025-09-07T06:39:31.6157475Z Feature: None specified 2025-09-07T06:39:31.6157792Z Profile: FULL_PROFILE 2025-09-07T06:39:31.6158110Z Float Round Mode: NEAR 2025-09-07T06:39:31.6158425Z Max Queue Number: 0(0x0) 2025-09-07T06:39:31.6158745Z Queue Min Size: 0(0x0) 2025-09-07T06:39:31.6159049Z Queue Max Size: 0(0x0) 2025-09-07T06:39:31.6159361Z Queue Type: MULTI 2025-09-07T06:39:31.6159648Z Node: 1 2025-09-07T06:39:31.6159944Z Device Type: CPU 2025-09-07T06:39:31.6160224Z Cache Info: 2025-09-07T06:39:31.6160451Z L1: 32768(0x8000) KB 2025-09-07T06:39:31.6160746Z Chip ID: 0(0x0) 2025-09-07T06:39:31.6161050Z ASIC Revision: 0(0x0) 2025-09-07T06:39:31.6161368Z Cacheline Size: 64(0x40) 2025-09-07T06:39:31.6161684Z Max Clock Freq. (MHz): 2000 2025-09-07T06:39:31.6161997Z BDFID: 0 2025-09-07T06:39:31.6162295Z Internal Node ID: 1 2025-09-07T06:39:31.6162621Z Compute Unit: 64 2025-09-07T06:39:31.6162941Z SIMDs per CU: 0 2025-09-07T06:39:31.6163248Z Shader Engines: 0 2025-09-07T06:39:31.6163578Z Shader Arrs. per Eng.: 0 2025-09-07T06:39:31.6163908Z WatchPts on Addr. Ranges:1 2025-09-07T06:39:31.6164219Z Memory Properties: 2025-09-07T06:39:31.6164432Z Features: None 2025-09-07T06:39:31.6164787Z Pool Info: 2025-09-07T06:39:31.6165008Z Pool 1 2025-09-07T06:39:31.6165259Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:39:31.6165566Z Size: 528402464(0x1f7ec820) KB 2025-09-07T06:39:31.6165862Z Allocatable: TRUE 2025-09-07T06:39:31.6166174Z Alloc Granule: 4KB 2025-09-07T06:39:31.6166498Z Alloc Recommended Granule:4KB 2025-09-07T06:39:31.6166827Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6167144Z Accessible by all: TRUE 2025-09-07T06:39:31.6167418Z Pool 2 2025-09-07T06:39:31.6167668Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:39:31.6167969Z Size: 528402464(0x1f7ec820) KB 2025-09-07T06:39:31.6168391Z Allocatable: TRUE 2025-09-07T06:39:31.6168721Z Alloc Granule: 4KB 2025-09-07T06:39:31.6169046Z Alloc Recommended Granule:4KB 2025-09-07T06:39:31.6169374Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6169697Z Accessible by all: TRUE 2025-09-07T06:39:31.6169989Z Pool 3 2025-09-07T06:39:31.6170241Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-09-07T06:39:31.6170546Z Size: 528402464(0x1f7ec820) KB 2025-09-07T06:39:31.6170843Z Allocatable: TRUE 2025-09-07T06:39:31.6171169Z Alloc Granule: 4KB 2025-09-07T06:39:31.6171507Z Alloc Recommended Granule:4KB 2025-09-07T06:39:31.6171857Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6172192Z Accessible by all: TRUE 2025-09-07T06:39:31.6172465Z Pool 4 2025-09-07T06:39:31.6172726Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:39:31.6173024Z Size: 528402464(0x1f7ec820) KB 2025-09-07T06:39:31.6173336Z Allocatable: TRUE 2025-09-07T06:39:31.6173672Z Alloc Granule: 4KB 2025-09-07T06:39:31.6174100Z Alloc Recommended Granule:4KB 2025-09-07T06:39:31.6174453Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6174777Z Accessible by all: TRUE 2025-09-07T06:39:31.6175090Z ISA Info: 2025-09-07T06:39:31.6175300Z ******* 2025-09-07T06:39:31.6175499Z Agent 3 2025-09-07T06:39:31.6175687Z ******* 2025-09-07T06:39:31.6175913Z Name: gfx90a 2025-09-07T06:39:31.6176359Z Uuid: GPU-fcde9f1dc11080c7 2025-09-07T06:39:31.6176688Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:39:31.6177018Z Vendor Name: AMD 2025-09-07T06:39:31.6177324Z Feature: KERNEL_DISPATCH 2025-09-07T06:39:31.6177641Z Profile: BASE_PROFILE 2025-09-07T06:39:31.6177959Z Float Round Mode: NEAR 2025-09-07T06:39:31.6178288Z Max Queue Number: 128(0x80) 2025-09-07T06:39:31.6178601Z Queue Min Size: 64(0x40) 2025-09-07T06:39:31.6179081Z Queue Max Size: 131072(0x20000) 2025-09-07T06:39:31.6179410Z Queue Type: MULTI 2025-09-07T06:39:31.6179697Z Node: 2 2025-09-07T06:39:31.6179995Z Device Type: GPU 2025-09-07T06:39:31.6180271Z Cache Info: 2025-09-07T06:39:31.6180504Z L1: 16(0x10) KB 2025-09-07T06:39:31.6180767Z L2: 8192(0x2000) KB 2025-09-07T06:39:31.6181048Z Chip ID: 29708(0x740c) 2025-09-07T06:39:31.6181365Z ASIC Revision: 1(0x1) 2025-09-07T06:39:31.6181672Z Cacheline Size: 128(0x80) 2025-09-07T06:39:31.6181991Z Max Clock Freq. (MHz): 1700 2025-09-07T06:39:31.6182287Z BDFID: 12800 2025-09-07T06:39:31.6182770Z Internal Node ID: 2 2025-09-07T06:39:31.6183084Z Compute Unit: 104 2025-09-07T06:39:31.6183382Z SIMDs per CU: 4 2025-09-07T06:39:31.6183692Z Shader Engines: 8 2025-09-07T06:39:31.6184010Z Shader Arrs. per Eng.: 1 2025-09-07T06:39:31.6184343Z WatchPts on Addr. Ranges:4 2025-09-07T06:39:31.6184670Z Coherent Host Access: FALSE 2025-09-07T06:39:31.6184963Z Memory Properties: 2025-09-07T06:39:31.6185187Z Features: KERNEL_DISPATCH 2025-09-07T06:39:31.6185483Z Fast F16 Operation: TRUE 2025-09-07T06:39:31.6185808Z Wavefront Size: 64(0x40) 2025-09-07T06:39:31.6186127Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6186430Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6186671Z x 1024(0x400) 2025-09-07T06:39:31.6186926Z y 1024(0x400) 2025-09-07T06:39:31.6187177Z z 1024(0x400) 2025-09-07T06:39:31.6187459Z Max Waves Per CU: 32(0x20) 2025-09-07T06:39:31.6187774Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:39:31.6188085Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6188369Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6188587Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6188853Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6189110Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6189423Z Max fbarriers/Workgrp: 32 2025-09-07T06:39:31.6195324Z Packet Processor uCode:: 92 2025-09-07T06:39:31.6195718Z SDMA engine uCode:: 9 2025-09-07T06:39:31.6196080Z IOMMU Support:: None 2025-09-07T06:39:31.6196362Z Pool Info: 2025-09-07T06:39:31.6196577Z Pool 1 2025-09-07T06:39:31.6196846Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:39:31.6197181Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6197501Z Allocatable: TRUE 2025-09-07T06:39:31.6197835Z Alloc Granule: 4KB 2025-09-07T06:39:31.6198175Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6198510Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6199035Z Accessible by all: FALSE 2025-09-07T06:39:31.6199325Z Pool 2 2025-09-07T06:39:31.6199594Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:39:31.6199921Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6200228Z Allocatable: TRUE 2025-09-07T06:39:31.6200547Z Alloc Granule: 4KB 2025-09-07T06:39:31.6200872Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6201208Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6201529Z Accessible by all: FALSE 2025-09-07T06:39:31.6201809Z Pool 3 2025-09-07T06:39:31.6202057Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:39:31.6202496Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6202793Z Allocatable: TRUE 2025-09-07T06:39:31.6203112Z Alloc Granule: 4KB 2025-09-07T06:39:31.6203447Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6203778Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6204106Z Accessible by all: FALSE 2025-09-07T06:39:31.6204382Z Pool 4 2025-09-07T06:39:31.6204625Z Segment: GROUP 2025-09-07T06:39:31.6204914Z Size: 64(0x40) KB 2025-09-07T06:39:31.6205202Z Allocatable: FALSE 2025-09-07T06:39:31.6205519Z Alloc Granule: 0KB 2025-09-07T06:39:31.6205853Z Alloc Recommended Granule:0KB 2025-09-07T06:39:31.6206183Z Alloc Alignment: 0KB 2025-09-07T06:39:31.6206496Z Accessible by all: FALSE 2025-09-07T06:39:31.6206777Z ISA Info: 2025-09-07T06:39:31.6206978Z ISA 1 2025-09-07T06:39:31.6207234Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:39:31.6207583Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:39:31.6207912Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:39:31.6208241Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6208569Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6208876Z Fast f16: TRUE 2025-09-07T06:39:31.6209183Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6209511Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6209778Z x 1024(0x400) 2025-09-07T06:39:31.6210034Z y 1024(0x400) 2025-09-07T06:39:31.6210288Z z 1024(0x400) 2025-09-07T06:39:31.6210579Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6210862Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6211097Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6211348Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6211602Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6211885Z FBarrier Max Size: 32 2025-09-07T06:39:31.6212164Z ******* 2025-09-07T06:39:31.6212353Z Agent 4 2025-09-07T06:39:31.6212673Z ******* 2025-09-07T06:39:31.6212890Z Name: gfx90a 2025-09-07T06:39:31.6213178Z Uuid: GPU-58e51c85c53e7e04 2025-09-07T06:39:31.6213491Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:39:31.6213799Z Vendor Name: AMD 2025-09-07T06:39:31.6214184Z Feature: KERNEL_DISPATCH 2025-09-07T06:39:31.6214479Z Profile: BASE_PROFILE 2025-09-07T06:39:31.6214835Z Float Round Mode: NEAR 2025-09-07T06:39:31.6215145Z Max Queue Number: 128(0x80) 2025-09-07T06:39:31.6215459Z Queue Min Size: 64(0x40) 2025-09-07T06:39:31.6215781Z Queue Max Size: 131072(0x20000) 2025-09-07T06:39:31.6216257Z Queue Type: MULTI 2025-09-07T06:39:31.6216552Z Node: 3 2025-09-07T06:39:31.6216827Z Device Type: GPU 2025-09-07T06:39:31.6217108Z Cache Info: 2025-09-07T06:39:31.6217331Z L1: 16(0x10) KB 2025-09-07T06:39:31.6217595Z L2: 8192(0x2000) KB 2025-09-07T06:39:31.6217869Z Chip ID: 29708(0x740c) 2025-09-07T06:39:31.6218158Z ASIC Revision: 1(0x1) 2025-09-07T06:39:31.6218471Z Cacheline Size: 128(0x80) 2025-09-07T06:39:31.6218777Z Max Clock Freq. (MHz): 1700 2025-09-07T06:39:31.6219068Z BDFID: 13568 2025-09-07T06:39:31.6219362Z Internal Node ID: 3 2025-09-07T06:39:31.6219687Z Compute Unit: 104 2025-09-07T06:39:31.6219974Z SIMDs per CU: 4 2025-09-07T06:39:31.6220281Z Shader Engines: 8 2025-09-07T06:39:31.6220599Z Shader Arrs. per Eng.: 1 2025-09-07T06:39:31.6220918Z WatchPts on Addr. Ranges:4 2025-09-07T06:39:31.6221243Z Coherent Host Access: FALSE 2025-09-07T06:39:31.6221524Z Memory Properties: 2025-09-07T06:39:31.6221760Z Features: KERNEL_DISPATCH 2025-09-07T06:39:31.6222048Z Fast F16 Operation: TRUE 2025-09-07T06:39:31.6222390Z Wavefront Size: 64(0x40) 2025-09-07T06:39:31.6222704Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6222986Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6223240Z x 1024(0x400) 2025-09-07T06:39:31.6223489Z y 1024(0x400) 2025-09-07T06:39:31.6223738Z z 1024(0x400) 2025-09-07T06:39:31.6224009Z Max Waves Per CU: 32(0x20) 2025-09-07T06:39:31.6224327Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:39:31.6224641Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6224916Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6225137Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6225388Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6225661Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6225965Z Max fbarriers/Workgrp: 32 2025-09-07T06:39:31.6226523Z Packet Processor uCode:: 92 2025-09-07T06:39:31.6226874Z SDMA engine uCode:: 9 2025-09-07T06:39:31.6227189Z IOMMU Support:: None 2025-09-07T06:39:31.6227462Z Pool Info: 2025-09-07T06:39:31.6227666Z Pool 1 2025-09-07T06:39:31.6227929Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:39:31.6228249Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6228572Z Allocatable: TRUE 2025-09-07T06:39:31.6228897Z Alloc Granule: 4KB 2025-09-07T06:39:31.6229243Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6229583Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6229904Z Accessible by all: FALSE 2025-09-07T06:39:31.6230313Z Pool 2 2025-09-07T06:39:31.6230562Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:39:31.6230860Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6231146Z Allocatable: TRUE 2025-09-07T06:39:31.6231459Z Alloc Granule: 4KB 2025-09-07T06:39:31.6231783Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6232122Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6232456Z Accessible by all: FALSE 2025-09-07T06:39:31.6232732Z Pool 3 2025-09-07T06:39:31.6232997Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:39:31.6233296Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6233621Z Allocatable: TRUE 2025-09-07T06:39:31.6233958Z Alloc Granule: 4KB 2025-09-07T06:39:31.6234286Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6234636Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6234964Z Accessible by all: FALSE 2025-09-07T06:39:31.6235254Z Pool 4 2025-09-07T06:39:31.6235494Z Segment: GROUP 2025-09-07T06:39:31.6235797Z Size: 64(0x40) KB 2025-09-07T06:39:31.6236098Z Allocatable: FALSE 2025-09-07T06:39:31.6236411Z Alloc Granule: 0KB 2025-09-07T06:39:31.6236756Z Alloc Recommended Granule:0KB 2025-09-07T06:39:31.6237114Z Alloc Alignment: 0KB 2025-09-07T06:39:31.6237472Z Accessible by all: FALSE 2025-09-07T06:39:31.6237779Z ISA Info: 2025-09-07T06:39:31.6237990Z ISA 1 2025-09-07T06:39:31.6238292Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:39:31.6238640Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:39:31.6238984Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:39:31.6239309Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6239658Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6239974Z Fast f16: TRUE 2025-09-07T06:39:31.6240297Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6240605Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6240995Z x 1024(0x400) 2025-09-07T06:39:31.6241276Z y 1024(0x400) 2025-09-07T06:39:31.6241528Z z 1024(0x400) 2025-09-07T06:39:31.6241827Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6242111Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6242366Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6242623Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6242904Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6243231Z FBarrier Max Size: 32 2025-09-07T06:39:31.6243509Z ******* 2025-09-07T06:39:31.6243715Z Agent 5 2025-09-07T06:39:31.6243903Z ******* 2025-09-07T06:39:31.6249822Z Name: gfx90a 2025-09-07T06:39:31.6250126Z Uuid: GPU-4add128351c0dde4 2025-09-07T06:39:31.6250460Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:39:31.6250794Z Vendor Name: AMD 2025-09-07T06:39:31.6251105Z Feature: KERNEL_DISPATCH 2025-09-07T06:39:31.6251428Z Profile: BASE_PROFILE 2025-09-07T06:39:31.6251745Z Float Round Mode: NEAR 2025-09-07T06:39:31.6252073Z Max Queue Number: 128(0x80) 2025-09-07T06:39:31.6252387Z Queue Min Size: 64(0x40) 2025-09-07T06:39:31.6252701Z Queue Max Size: 131072(0x20000) 2025-09-07T06:39:31.6253004Z Queue Type: MULTI 2025-09-07T06:39:31.6253296Z Node: 4 2025-09-07T06:39:31.6253581Z Device Type: GPU 2025-09-07T06:39:31.6253927Z Cache Info: 2025-09-07T06:39:31.6254153Z L1: 16(0x10) KB 2025-09-07T06:39:31.6254412Z L2: 8192(0x2000) KB 2025-09-07T06:39:31.6254687Z Chip ID: 29708(0x740c) 2025-09-07T06:39:31.6254975Z ASIC Revision: 1(0x1) 2025-09-07T06:39:31.6255288Z Cacheline Size: 128(0x80) 2025-09-07T06:39:31.6255597Z Max Clock Freq. (MHz): 1700 2025-09-07T06:39:31.6255882Z BDFID: 4352 2025-09-07T06:39:31.6256176Z Internal Node ID: 4 2025-09-07T06:39:31.6256482Z Compute Unit: 104 2025-09-07T06:39:31.6256781Z SIMDs per CU: 4 2025-09-07T06:39:31.6257076Z Shader Engines: 8 2025-09-07T06:39:31.6257397Z Shader Arrs. per Eng.: 1 2025-09-07T06:39:31.6257717Z WatchPts on Addr. Ranges:4 2025-09-07T06:39:31.6258034Z Coherent Host Access: FALSE 2025-09-07T06:39:31.6258318Z Memory Properties: 2025-09-07T06:39:31.6258539Z Features: KERNEL_DISPATCH 2025-09-07T06:39:31.6258833Z Fast F16 Operation: TRUE 2025-09-07T06:39:31.6259155Z Wavefront Size: 64(0x40) 2025-09-07T06:39:31.6259470Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6259761Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6260174Z x 1024(0x400) 2025-09-07T06:39:31.6260443Z y 1024(0x400) 2025-09-07T06:39:31.6260683Z z 1024(0x400) 2025-09-07T06:39:31.6260961Z Max Waves Per CU: 32(0x20) 2025-09-07T06:39:31.6261273Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:39:31.6261589Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6261861Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6262095Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6262363Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6262607Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6262899Z Max fbarriers/Workgrp: 32 2025-09-07T06:39:31.6263236Z Packet Processor uCode:: 92 2025-09-07T06:39:31.6263720Z SDMA engine uCode:: 9 2025-09-07T06:39:31.6264030Z IOMMU Support:: None 2025-09-07T06:39:31.6264304Z Pool Info: 2025-09-07T06:39:31.6264502Z Pool 1 2025-09-07T06:39:31.6264757Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:39:31.6265068Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6265366Z Allocatable: TRUE 2025-09-07T06:39:31.6265684Z Alloc Granule: 4KB 2025-09-07T06:39:31.6266018Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6266359Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6266683Z Accessible by all: FALSE 2025-09-07T06:39:31.6266951Z Pool 2 2025-09-07T06:39:31.6267216Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:39:31.6267536Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6267853Z Allocatable: TRUE 2025-09-07T06:39:31.6268173Z Alloc Granule: 4KB 2025-09-07T06:39:31.6268515Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6268850Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6269170Z Accessible by all: FALSE 2025-09-07T06:39:31.6269452Z Pool 3 2025-09-07T06:39:31.6269698Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:39:31.6269996Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6270294Z Allocatable: TRUE 2025-09-07T06:39:31.6270616Z Alloc Granule: 4KB 2025-09-07T06:39:31.6270946Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6271278Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6271608Z Accessible by all: FALSE 2025-09-07T06:39:31.6271880Z Pool 4 2025-09-07T06:39:31.6272122Z Segment: GROUP 2025-09-07T06:39:31.6272405Z Size: 64(0x40) KB 2025-09-07T06:39:31.6272703Z Allocatable: FALSE 2025-09-07T06:39:31.6273015Z Alloc Granule: 0KB 2025-09-07T06:39:31.6273341Z Alloc Recommended Granule:0KB 2025-09-07T06:39:31.6273666Z Alloc Alignment: 0KB 2025-09-07T06:39:31.6274139Z Accessible by all: FALSE 2025-09-07T06:39:31.6274429Z ISA Info: 2025-09-07T06:39:31.6274623Z ISA 1 2025-09-07T06:39:31.6274885Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:39:31.6275223Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:39:31.6275554Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:39:31.6275892Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6276231Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6276550Z Fast f16: TRUE 2025-09-07T06:39:31.6276853Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6277152Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6277409Z x 1024(0x400) 2025-09-07T06:39:31.6277830Z y 1024(0x400) 2025-09-07T06:39:31.6278081Z z 1024(0x400) 2025-09-07T06:39:31.6278360Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6278640Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6278869Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6279127Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6279379Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6279669Z FBarrier Max Size: 32 2025-09-07T06:39:31.6279941Z ******* 2025-09-07T06:39:31.6280129Z Agent 6 2025-09-07T06:39:31.6280315Z ******* 2025-09-07T06:39:31.6280529Z Name: gfx90a 2025-09-07T06:39:31.6280832Z Uuid: GPU-04fbe3b4a00a45d1 2025-09-07T06:39:31.6281138Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:39:31.6281459Z Vendor Name: AMD 2025-09-07T06:39:31.6281756Z Feature: KERNEL_DISPATCH 2025-09-07T06:39:31.6282062Z Profile: BASE_PROFILE 2025-09-07T06:39:31.6282382Z Float Round Mode: NEAR 2025-09-07T06:39:31.6282701Z Max Queue Number: 128(0x80) 2025-09-07T06:39:31.6283014Z Queue Min Size: 64(0x40) 2025-09-07T06:39:31.6283316Z Queue Max Size: 131072(0x20000) 2025-09-07T06:39:31.6283623Z Queue Type: MULTI 2025-09-07T06:39:31.6283905Z Node: 5 2025-09-07T06:39:31.6284210Z Device Type: GPU 2025-09-07T06:39:31.6284486Z Cache Info: 2025-09-07T06:39:31.6284713Z L1: 16(0x10) KB 2025-09-07T06:39:31.6284995Z L2: 8192(0x2000) KB 2025-09-07T06:39:31.6285256Z Chip ID: 29708(0x740c) 2025-09-07T06:39:31.6285553Z ASIC Revision: 1(0x1) 2025-09-07T06:39:31.6285860Z Cacheline Size: 128(0x80) 2025-09-07T06:39:31.6286171Z Max Clock Freq. (MHz): 1700 2025-09-07T06:39:31.6286463Z BDFID: 5120 2025-09-07T06:39:31.6286753Z Internal Node ID: 5 2025-09-07T06:39:31.6287062Z Compute Unit: 104 2025-09-07T06:39:31.6287497Z SIMDs per CU: 4 2025-09-07T06:39:31.6287808Z Shader Engines: 8 2025-09-07T06:39:31.6288115Z Shader Arrs. per Eng.: 1 2025-09-07T06:39:31.6288439Z WatchPts on Addr. Ranges:4 2025-09-07T06:39:31.6288766Z Coherent Host Access: FALSE 2025-09-07T06:39:31.6289054Z Memory Properties: 2025-09-07T06:39:31.6289279Z Features: KERNEL_DISPATCH 2025-09-07T06:39:31.6289563Z Fast F16 Operation: TRUE 2025-09-07T06:39:31.6289886Z Wavefront Size: 64(0x40) 2025-09-07T06:39:31.6290199Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6290491Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6290728Z x 1024(0x400) 2025-09-07T06:39:31.6291114Z y 1024(0x400) 2025-09-07T06:39:31.6291367Z z 1024(0x400) 2025-09-07T06:39:31.6291638Z Max Waves Per CU: 32(0x20) 2025-09-07T06:39:31.6291952Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:39:31.6292262Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6292548Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6292771Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6293035Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6293298Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6293591Z Max fbarriers/Workgrp: 32 2025-09-07T06:39:31.6294002Z Packet Processor uCode:: 92 2025-09-07T06:39:31.6294331Z SDMA engine uCode:: 9 2025-09-07T06:39:31.6294658Z IOMMU Support:: None 2025-09-07T06:39:31.6294928Z Pool Info: 2025-09-07T06:39:31.6295140Z Pool 1 2025-09-07T06:39:31.6295396Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:39:31.6295701Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6296012Z Allocatable: TRUE 2025-09-07T06:39:31.6296326Z Alloc Granule: 4KB 2025-09-07T06:39:31.6296667Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6297005Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6297357Z Accessible by all: FALSE 2025-09-07T06:39:31.6297639Z Pool 2 2025-09-07T06:39:31.6297903Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:39:31.6298237Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6298538Z Allocatable: TRUE 2025-09-07T06:39:31.6298873Z Alloc Granule: 4KB 2025-09-07T06:39:31.6299216Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6299560Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6299890Z Accessible by all: FALSE 2025-09-07T06:39:31.6300183Z Pool 3 2025-09-07T06:39:31.6300448Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:39:31.6300753Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6301060Z Allocatable: TRUE 2025-09-07T06:39:31.6301564Z Alloc Granule: 4KB 2025-09-07T06:39:31.6301923Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6302256Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6302612Z Accessible by all: FALSE 2025-09-07T06:39:31.6302903Z Pool 4 2025-09-07T06:39:31.6303150Z Segment: GROUP 2025-09-07T06:39:31.6303446Z Size: 64(0x40) KB 2025-09-07T06:39:31.6303737Z Allocatable: FALSE 2025-09-07T06:39:31.6304075Z Alloc Granule: 0KB 2025-09-07T06:39:31.6304403Z Alloc Recommended Granule:0KB 2025-09-07T06:39:31.6304762Z Alloc Alignment: 0KB 2025-09-07T06:39:31.6305099Z Accessible by all: FALSE 2025-09-07T06:39:31.6305536Z ISA Info: 2025-09-07T06:39:31.6305748Z ISA 1 2025-09-07T06:39:31.6306008Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:39:31.6306379Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:39:31.6306714Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:39:31.6307053Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6307395Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6307703Z Fast f16: TRUE 2025-09-07T06:39:31.6308010Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6308304Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6308567Z x 1024(0x400) 2025-09-07T06:39:31.6308835Z y 1024(0x400) 2025-09-07T06:39:31.6309094Z z 1024(0x400) 2025-09-07T06:39:31.6309379Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6309658Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6309895Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6310147Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6310410Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6310697Z FBarrier Max Size: 32 2025-09-07T06:39:31.6310973Z ******* 2025-09-07T06:39:31.6311280Z Agent 7 2025-09-07T06:39:31.6311563Z ******* 2025-09-07T06:39:31.6311783Z Name: gfx90a 2025-09-07T06:39:31.6312097Z Uuid: GPU-d5bef60d28576f7f 2025-09-07T06:39:31.6312426Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:39:31.6312747Z Vendor Name: AMD 2025-09-07T06:39:31.6313064Z Feature: KERNEL_DISPATCH 2025-09-07T06:39:31.6313378Z Profile: BASE_PROFILE 2025-09-07T06:39:31.6313688Z Float Round Mode: NEAR 2025-09-07T06:39:31.6314016Z Max Queue Number: 128(0x80) 2025-09-07T06:39:31.6314320Z Queue Min Size: 64(0x40) 2025-09-07T06:39:31.6314633Z Queue Max Size: 131072(0x20000) 2025-09-07T06:39:31.6314935Z Queue Type: MULTI 2025-09-07T06:39:31.6315233Z Node: 6 2025-09-07T06:39:31.6315655Z Device Type: GPU 2025-09-07T06:39:31.6315944Z Cache Info: 2025-09-07T06:39:31.6316164Z L1: 16(0x10) KB 2025-09-07T06:39:31.6316426Z L2: 8192(0x2000) KB 2025-09-07T06:39:31.6316697Z Chip ID: 29708(0x740c) 2025-09-07T06:39:31.6316990Z ASIC Revision: 1(0x1) 2025-09-07T06:39:31.6317304Z Cacheline Size: 128(0x80) 2025-09-07T06:39:31.6317613Z Max Clock Freq. (MHz): 1700 2025-09-07T06:39:31.6317905Z BDFID: 44544 2025-09-07T06:39:31.6318204Z Internal Node ID: 6 2025-09-07T06:39:31.6318507Z Compute Unit: 104 2025-09-07T06:39:31.6318809Z SIMDs per CU: 4 2025-09-07T06:39:31.6319247Z Shader Engines: 8 2025-09-07T06:39:31.6319568Z Shader Arrs. per Eng.: 1 2025-09-07T06:39:31.6319889Z WatchPts on Addr. Ranges:4 2025-09-07T06:39:31.6320221Z Coherent Host Access: FALSE 2025-09-07T06:39:31.6320517Z Memory Properties: 2025-09-07T06:39:31.6320741Z Features: KERNEL_DISPATCH 2025-09-07T06:39:31.6321034Z Fast F16 Operation: TRUE 2025-09-07T06:39:31.6321348Z Wavefront Size: 64(0x40) 2025-09-07T06:39:31.6321670Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6321961Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6322217Z x 1024(0x400) 2025-09-07T06:39:31.6322475Z y 1024(0x400) 2025-09-07T06:39:31.6322739Z z 1024(0x400) 2025-09-07T06:39:31.6323026Z Max Waves Per CU: 32(0x20) 2025-09-07T06:39:31.6323340Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:39:31.6323671Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6323945Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6324174Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6324432Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6324694Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6325002Z Max fbarriers/Workgrp: 32 2025-09-07T06:39:31.6325336Z Packet Processor uCode:: 92 2025-09-07T06:39:31.6325668Z SDMA engine uCode:: 9 2025-09-07T06:39:31.6325987Z IOMMU Support:: None 2025-09-07T06:39:31.6326276Z Pool Info: 2025-09-07T06:39:31.6326473Z Pool 1 2025-09-07T06:39:31.6326737Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:39:31.6327050Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6327355Z Allocatable: TRUE 2025-09-07T06:39:31.6327679Z Alloc Granule: 4KB 2025-09-07T06:39:31.6328007Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6328348Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6328673Z Accessible by all: FALSE 2025-09-07T06:39:31.6328957Z Pool 2 2025-09-07T06:39:31.6329212Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:39:31.6329654Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6329965Z Allocatable: TRUE 2025-09-07T06:39:31.6330282Z Alloc Granule: 4KB 2025-09-07T06:39:31.6330621Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6330955Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6331285Z Accessible by all: FALSE 2025-09-07T06:39:31.6331568Z Pool 3 2025-09-07T06:39:31.6331808Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:39:31.6332113Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6332421Z Allocatable: TRUE 2025-09-07T06:39:31.6332741Z Alloc Granule: 4KB 2025-09-07T06:39:31.6333238Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6333579Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6333975Z Accessible by all: FALSE 2025-09-07T06:39:31.6334255Z Pool 4 2025-09-07T06:39:31.6334503Z Segment: GROUP 2025-09-07T06:39:31.6334788Z Size: 64(0x40) KB 2025-09-07T06:39:31.6335090Z Allocatable: FALSE 2025-09-07T06:39:31.6335402Z Alloc Granule: 0KB 2025-09-07T06:39:31.6335734Z Alloc Recommended Granule:0KB 2025-09-07T06:39:31.6336069Z Alloc Alignment: 0KB 2025-09-07T06:39:31.6336384Z Accessible by all: FALSE 2025-09-07T06:39:31.6336671Z ISA Info: 2025-09-07T06:39:31.6336872Z ISA 1 2025-09-07T06:39:31.6337134Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:39:31.6337472Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:39:31.6337802Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:39:31.6338132Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6338474Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6338787Z Fast f16: TRUE 2025-09-07T06:39:31.6339096Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6339397Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6339647Z x 1024(0x400) 2025-09-07T06:39:31.6339913Z y 1024(0x400) 2025-09-07T06:39:31.6340168Z z 1024(0x400) 2025-09-07T06:39:31.6340458Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6340740Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6340968Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6341227Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6341481Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6341777Z FBarrier Max Size: 32 2025-09-07T06:39:31.6342049Z ******* 2025-09-07T06:39:31.6342241Z Agent 8 2025-09-07T06:39:31.6342428Z ******* 2025-09-07T06:39:31.6342658Z Name: gfx90a 2025-09-07T06:39:31.6342960Z Uuid: GPU-8a1a325e7a817ddd 2025-09-07T06:39:31.6343436Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:39:31.6343771Z Vendor Name: AMD 2025-09-07T06:39:31.6344084Z Feature: KERNEL_DISPATCH 2025-09-07T06:39:31.6344396Z Profile: BASE_PROFILE 2025-09-07T06:39:31.6344705Z Float Round Mode: NEAR 2025-09-07T06:39:31.6345024Z Max Queue Number: 128(0x80) 2025-09-07T06:39:31.6345341Z Queue Min Size: 64(0x40) 2025-09-07T06:39:31.6345646Z Queue Max Size: 131072(0x20000) 2025-09-07T06:39:31.6345947Z Queue Type: MULTI 2025-09-07T06:39:31.6346224Z Node: 7 2025-09-07T06:39:31.6346513Z Device Type: GPU 2025-09-07T06:39:31.6346773Z Cache Info: 2025-09-07T06:39:31.6347152Z L1: 16(0x10) KB 2025-09-07T06:39:31.6347418Z L2: 8192(0x2000) KB 2025-09-07T06:39:31.6347683Z Chip ID: 29708(0x740c) 2025-09-07T06:39:31.6347982Z ASIC Revision: 1(0x1) 2025-09-07T06:39:31.6348281Z Cacheline Size: 128(0x80) 2025-09-07T06:39:31.6348602Z Max Clock Freq. (MHz): 1700 2025-09-07T06:39:31.6348888Z BDFID: 45824 2025-09-07T06:39:31.6349183Z Internal Node ID: 7 2025-09-07T06:39:31.6349495Z Compute Unit: 104 2025-09-07T06:39:31.6349797Z SIMDs per CU: 4 2025-09-07T06:39:31.6350122Z Shader Engines: 8 2025-09-07T06:39:31.6350456Z Shader Arrs. per Eng.: 1 2025-09-07T06:39:31.6350785Z WatchPts on Addr. Ranges:4 2025-09-07T06:39:31.6351110Z Coherent Host Access: FALSE 2025-09-07T06:39:31.6351411Z Memory Properties: 2025-09-07T06:39:31.6351639Z Features: KERNEL_DISPATCH 2025-09-07T06:39:31.6351939Z Fast F16 Operation: TRUE 2025-09-07T06:39:31.6352276Z Wavefront Size: 64(0x40) 2025-09-07T06:39:31.6352598Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6352899Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6353153Z x 1024(0x400) 2025-09-07T06:39:31.6353416Z y 1024(0x400) 2025-09-07T06:39:31.6353664Z z 1024(0x400) 2025-09-07T06:39:31.6353961Z Max Waves Per CU: 32(0x20) 2025-09-07T06:39:31.6354281Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:39:31.6354595Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6354878Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6355107Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6355374Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6355635Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6355943Z Max fbarriers/Workgrp: 32 2025-09-07T06:39:31.6356297Z Packet Processor uCode:: 92 2025-09-07T06:39:31.6356627Z SDMA engine uCode:: 9 2025-09-07T06:39:31.6356953Z IOMMU Support:: None 2025-09-07T06:39:31.6357233Z Pool Info: 2025-09-07T06:39:31.6357580Z Pool 1 2025-09-07T06:39:31.6357847Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:39:31.6358171Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6358489Z Allocatable: TRUE 2025-09-07T06:39:31.6358809Z Alloc Granule: 4KB 2025-09-07T06:39:31.6359153Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6359500Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6359829Z Accessible by all: FALSE 2025-09-07T06:39:31.6360107Z Pool 2 2025-09-07T06:39:31.6360369Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:39:31.6360670Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6361103Z Allocatable: TRUE 2025-09-07T06:39:31.6361425Z Alloc Granule: 4KB 2025-09-07T06:39:31.6361759Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6362102Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6362432Z Accessible by all: FALSE 2025-09-07T06:39:31.6362714Z Pool 3 2025-09-07T06:39:31.6362957Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:39:31.6363258Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6363554Z Allocatable: TRUE 2025-09-07T06:39:31.6363865Z Alloc Granule: 4KB 2025-09-07T06:39:31.6364202Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6364537Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6364871Z Accessible by all: FALSE 2025-09-07T06:39:31.6365142Z Pool 4 2025-09-07T06:39:31.6365383Z Segment: GROUP 2025-09-07T06:39:31.6365670Z Size: 64(0x40) KB 2025-09-07T06:39:31.6365957Z Allocatable: FALSE 2025-09-07T06:39:31.6366277Z Alloc Granule: 0KB 2025-09-07T06:39:31.6366605Z Alloc Recommended Granule:0KB 2025-09-07T06:39:31.6366941Z Alloc Alignment: 0KB 2025-09-07T06:39:31.6367258Z Accessible by all: FALSE 2025-09-07T06:39:31.6367537Z ISA Info: 2025-09-07T06:39:31.6367741Z ISA 1 2025-09-07T06:39:31.6368007Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:39:31.6368357Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:39:31.6368685Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:39:31.6369021Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6369358Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6369673Z Fast f16: TRUE 2025-09-07T06:39:31.6369990Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6370287Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6370554Z x 1024(0x400) 2025-09-07T06:39:31.6370813Z y 1024(0x400) 2025-09-07T06:39:31.6371078Z z 1024(0x400) 2025-09-07T06:39:31.6371511Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6371805Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6372039Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6384470Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6384788Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6385107Z FBarrier Max Size: 32 2025-09-07T06:39:31.6385395Z ******* 2025-09-07T06:39:31.6385579Z Agent 9 2025-09-07T06:39:31.6385770Z ******* 2025-09-07T06:39:31.6385998Z Name: gfx90a 2025-09-07T06:39:31.6386316Z Uuid: GPU-26deaaad0d24bc07 2025-09-07T06:39:31.6386634Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:39:31.6387253Z Vendor Name: AMD 2025-09-07T06:39:31.6387571Z Feature: KERNEL_DISPATCH 2025-09-07T06:39:31.6387875Z Profile: BASE_PROFILE 2025-09-07T06:39:31.6388192Z Float Round Mode: NEAR 2025-09-07T06:39:31.6388505Z Max Queue Number: 128(0x80) 2025-09-07T06:39:31.6388832Z Queue Min Size: 64(0x40) 2025-09-07T06:39:31.6389133Z Queue Max Size: 131072(0x20000) 2025-09-07T06:39:31.6389442Z Queue Type: MULTI 2025-09-07T06:39:31.6389730Z Node: 8 2025-09-07T06:39:31.6390014Z Device Type: GPU 2025-09-07T06:39:31.6390306Z Cache Info: 2025-09-07T06:39:31.6390534Z L1: 16(0x10) KB 2025-09-07T06:39:31.6390829Z L2: 8192(0x2000) KB 2025-09-07T06:39:31.6391114Z Chip ID: 29708(0x740c) 2025-09-07T06:39:31.6391431Z ASIC Revision: 1(0x1) 2025-09-07T06:39:31.6391756Z Cacheline Size: 128(0x80) 2025-09-07T06:39:31.6392088Z Max Clock Freq. (MHz): 1700 2025-09-07T06:39:31.6392415Z BDFID: 36352 2025-09-07T06:39:31.6392708Z Internal Node ID: 8 2025-09-07T06:39:31.6393019Z Compute Unit: 104 2025-09-07T06:39:31.6393313Z SIMDs per CU: 4 2025-09-07T06:39:31.6393621Z Shader Engines: 8 2025-09-07T06:39:31.6393946Z Shader Arrs. per Eng.: 1 2025-09-07T06:39:31.6394281Z WatchPts on Addr. Ranges:4 2025-09-07T06:39:31.6394628Z Coherent Host Access: FALSE 2025-09-07T06:39:31.6394922Z Memory Properties: 2025-09-07T06:39:31.6395173Z Features: KERNEL_DISPATCH 2025-09-07T06:39:31.6395473Z Fast F16 Operation: TRUE 2025-09-07T06:39:31.6395799Z Wavefront Size: 64(0x40) 2025-09-07T06:39:31.6396121Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6396413Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6396668Z x 1024(0x400) 2025-09-07T06:39:31.6396934Z y 1024(0x400) 2025-09-07T06:39:31.6397196Z z 1024(0x400) 2025-09-07T06:39:31.6397477Z Max Waves Per CU: 32(0x20) 2025-09-07T06:39:31.6397970Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:39:31.6398286Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6398569Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6398793Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6399057Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6399320Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6399614Z Max fbarriers/Workgrp: 32 2025-09-07T06:39:31.6399970Z Packet Processor uCode:: 92 2025-09-07T06:39:31.6400292Z SDMA engine uCode:: 9 2025-09-07T06:39:31.6400608Z IOMMU Support:: None 2025-09-07T06:39:31.6400884Z Pool Info: 2025-09-07T06:39:31.6401087Z Pool 1 2025-09-07T06:39:31.6401588Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:39:31.6401897Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6402200Z Allocatable: TRUE 2025-09-07T06:39:31.6402517Z Alloc Granule: 4KB 2025-09-07T06:39:31.6402853Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6403191Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6403510Z Accessible by all: FALSE 2025-09-07T06:39:31.6403784Z Pool 2 2025-09-07T06:39:31.6404028Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:39:31.6404341Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6404645Z Allocatable: TRUE 2025-09-07T06:39:31.6404981Z Alloc Granule: 4KB 2025-09-07T06:39:31.6405314Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6405649Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6405980Z Accessible by all: FALSE 2025-09-07T06:39:31.6406252Z Pool 3 2025-09-07T06:39:31.6406501Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:39:31.6406798Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6407090Z Allocatable: TRUE 2025-09-07T06:39:31.6407406Z Alloc Granule: 4KB 2025-09-07T06:39:31.6407730Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6408068Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6408391Z Accessible by all: FALSE 2025-09-07T06:39:31.6408667Z Pool 4 2025-09-07T06:39:31.6408896Z Segment: GROUP 2025-09-07T06:39:31.6409180Z Size: 64(0x40) KB 2025-09-07T06:39:31.6409470Z Allocatable: FALSE 2025-09-07T06:39:31.6409789Z Alloc Granule: 0KB 2025-09-07T06:39:31.6410120Z Alloc Recommended Granule:0KB 2025-09-07T06:39:31.6410448Z Alloc Alignment: 0KB 2025-09-07T06:39:31.6410768Z Accessible by all: FALSE 2025-09-07T06:39:31.6411040Z ISA Info: 2025-09-07T06:39:31.6411244Z ISA 1 2025-09-07T06:39:31.6411638Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:39:31.6412002Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:39:31.6412334Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:39:31.6412652Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6412983Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6413285Z Fast f16: TRUE 2025-09-07T06:39:31.6413592Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6414004Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6414266Z x 1024(0x400) 2025-09-07T06:39:31.6414527Z y 1024(0x400) 2025-09-07T06:39:31.6414772Z z 1024(0x400) 2025-09-07T06:39:31.6415060Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6415506Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6415744Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6415997Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6416253Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6416541Z FBarrier Max Size: 32 2025-09-07T06:39:31.6416808Z ******* 2025-09-07T06:39:31.6417000Z Agent 10 2025-09-07T06:39:31.6417182Z ******* 2025-09-07T06:39:31.6417394Z Name: gfx90a 2025-09-07T06:39:31.6417680Z Uuid: GPU-750f6a6a723531b8 2025-09-07T06:39:31.6417991Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:39:31.6418301Z Vendor Name: AMD 2025-09-07T06:39:31.6418616Z Feature: KERNEL_DISPATCH 2025-09-07T06:39:31.6418916Z Profile: BASE_PROFILE 2025-09-07T06:39:31.6419222Z Float Round Mode: NEAR 2025-09-07T06:39:31.6419537Z Max Queue Number: 128(0x80) 2025-09-07T06:39:31.6419838Z Queue Min Size: 64(0x40) 2025-09-07T06:39:31.6420140Z Queue Max Size: 131072(0x20000) 2025-09-07T06:39:31.6420436Z Queue Type: MULTI 2025-09-07T06:39:31.6420717Z Node: 9 2025-09-07T06:39:31.6421009Z Device Type: GPU 2025-09-07T06:39:31.6421270Z Cache Info: 2025-09-07T06:39:31.6421491Z L1: 16(0x10) KB 2025-09-07T06:39:31.6421752Z L2: 8192(0x2000) KB 2025-09-07T06:39:31.6422026Z Chip ID: 29708(0x740c) 2025-09-07T06:39:31.6422316Z ASIC Revision: 1(0x1) 2025-09-07T06:39:31.6422642Z Cacheline Size: 128(0x80) 2025-09-07T06:39:31.6422956Z Max Clock Freq. (MHz): 1700 2025-09-07T06:39:31.6423245Z BDFID: 37632 2025-09-07T06:39:31.6423535Z Internal Node ID: 9 2025-09-07T06:39:31.6423837Z Compute Unit: 104 2025-09-07T06:39:31.6424134Z SIMDs per CU: 4 2025-09-07T06:39:31.6424433Z Shader Engines: 8 2025-09-07T06:39:31.6424749Z Shader Arrs. per Eng.: 1 2025-09-07T06:39:31.6425246Z WatchPts on Addr. Ranges:4 2025-09-07T06:39:31.6425576Z Coherent Host Access: FALSE 2025-09-07T06:39:31.6425858Z Memory Properties: 2025-09-07T06:39:31.6426082Z Features: KERNEL_DISPATCH 2025-09-07T06:39:31.6426365Z Fast F16 Operation: TRUE 2025-09-07T06:39:31.6426675Z Wavefront Size: 64(0x40) 2025-09-07T06:39:31.6426987Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6427266Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6427504Z x 1024(0x400) 2025-09-07T06:39:31.6427761Z y 1024(0x400) 2025-09-07T06:39:31.6428004Z z 1024(0x400) 2025-09-07T06:39:31.6428296Z Max Waves Per CU: 32(0x20) 2025-09-07T06:39:31.6428612Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:39:31.6429056Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6429330Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6429555Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6429816Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6430065Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6430366Z Max fbarriers/Workgrp: 32 2025-09-07T06:39:31.6430704Z Packet Processor uCode:: 92 2025-09-07T06:39:31.6431032Z SDMA engine uCode:: 9 2025-09-07T06:39:31.6431344Z IOMMU Support:: None 2025-09-07T06:39:31.6431622Z Pool Info: 2025-09-07T06:39:31.6431824Z Pool 1 2025-09-07T06:39:31.6432079Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:39:31.6432400Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6432704Z Allocatable: TRUE 2025-09-07T06:39:31.6433021Z Alloc Granule: 4KB 2025-09-07T06:39:31.6433351Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6433691Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6434011Z Accessible by all: FALSE 2025-09-07T06:39:31.6434293Z Pool 2 2025-09-07T06:39:31.6434545Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:39:31.6434845Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6435143Z Allocatable: TRUE 2025-09-07T06:39:31.6435456Z Alloc Granule: 4KB 2025-09-07T06:39:31.6435792Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6436121Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6436444Z Accessible by all: FALSE 2025-09-07T06:39:31.6436723Z Pool 3 2025-09-07T06:39:31.6436965Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:39:31.6437264Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:39:31.6437557Z Allocatable: TRUE 2025-09-07T06:39:31.6437872Z Alloc Granule: 4KB 2025-09-07T06:39:31.6438204Z Alloc Recommended Granule:2048KB 2025-09-07T06:39:31.6438540Z Alloc Alignment: 4KB 2025-09-07T06:39:31.6439000Z Accessible by all: FALSE 2025-09-07T06:39:31.6439284Z Pool 4 2025-09-07T06:39:31.6439522Z Segment: GROUP 2025-09-07T06:39:31.6439804Z Size: 64(0x40) KB 2025-09-07T06:39:31.6440105Z Allocatable: FALSE 2025-09-07T06:39:31.6440409Z Alloc Granule: 0KB 2025-09-07T06:39:31.6440734Z Alloc Recommended Granule:0KB 2025-09-07T06:39:31.6441062Z Alloc Alignment: 0KB 2025-09-07T06:39:31.6441376Z Accessible by all: FALSE 2025-09-07T06:39:31.6441650Z ISA Info: 2025-09-07T06:39:31.6441837Z ISA 1 2025-09-07T06:39:31.6442087Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:39:31.6442429Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:39:31.6442887Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:39:31.6443210Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6443533Z Default Rounding Mode: NEAR 2025-09-07T06:39:31.6443836Z Fast f16: TRUE 2025-09-07T06:39:31.6444132Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:39:31.6444423Z Workgroup Max Size per Dimension: 2025-09-07T06:39:31.6444675Z x 1024(0x400) 2025-09-07T06:39:31.6444933Z y 1024(0x400) 2025-09-07T06:39:31.6445182Z z 1024(0x400) 2025-09-07T06:39:31.6445456Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:39:31.6445748Z Grid Max Size per Dimension: 2025-09-07T06:39:31.6445977Z x 4294967295(0xffffffff) 2025-09-07T06:39:31.6446233Z y 4294967295(0xffffffff) 2025-09-07T06:39:31.6446484Z z 4294967295(0xffffffff) 2025-09-07T06:39:31.6446771Z FBarrier Max Size: 32 2025-09-07T06:39:31.6447046Z *** Done *** 2025-09-07T06:39:31.6467746Z ##[group]Run ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-09-07T06:39:31.6468135Z ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-09-07T06:39:31.6468752Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-09-07T06:39:31.6469337Z if [[ $ngpu -eq 0 ]]; then 2025-09-07T06:39:31.6469641Z  echo "Error: Failed to detect any GPUs on the runner" 2025-09-07T06:39:31.6469948Z  echo "$msg" 2025-09-07T06:39:31.6470151Z  exit 1 2025-09-07T06:39:31.6470325Z fi 2025-09-07T06:39:31.6508348Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:31.6508672Z env: 2025-09-07T06:39:31.6508842Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:31.6509049Z ##[endgroup] 2025-09-07T06:39:31.8079906Z ##[group]Run pytorch/pytorch/.github/actions/diskspace-cleanup@main 2025-09-07T06:39:31.8080255Z with: 2025-09-07T06:39:31.8080463Z diskspace-cutoff: 70 2025-09-07T06:39:31.8080655Z env: 2025-09-07T06:39:31.8080830Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:31.8081034Z ##[endgroup] 2025-09-07T06:39:31.8120126Z ##[group]Run set -ex 2025-09-07T06:39:31.8120413Z set -ex 2025-09-07T06:39:31.8120609Z diskspace_cutoff=70 2025-09-07T06:39:31.8120936Z docker_root_dir=$(docker info -f '{{.DockerRootDir}}') 2025-09-07T06:39:31.8121281Z if [ ! -d "$docker_root_dir" ]; then 2025-09-07T06:39:31.8121979Z  echo "Docker root directory ($docker_root_dir) does not exist. Skipping disk space check." 2025-09-07T06:39:31.8122410Z  exit 0 2025-09-07T06:39:31.8122602Z fi 2025-09-07T06:39:31.8122962Z diskspace=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-09-07T06:39:31.8123703Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-09-07T06:39:31.8124341Z if [[ "$diskspace" -ge "$diskspace_cutoff" ]] ; then 2025-09-07T06:39:31.8124652Z  docker system prune -af 2025-09-07T06:39:31.8125073Z  diskspace_new=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-09-07T06:39:31.8125572Z  if [[ "$diskspace_new" -gt "$diskspace_cutoff" ]] ; then 2025-09-07T06:39:31.8126087Z  echo "Error: Available diskspace is less than $diskspace_cutoff percent. Not enough diskspace." 2025-09-07T06:39:31.8126752Z  echo "$msg" 2025-09-07T06:39:31.8126963Z  exit 1 2025-09-07T06:39:31.8127165Z  else 2025-09-07T06:39:31.8127397Z  difference=$((diskspace - diskspace_new)) 2025-09-07T06:39:31.8127759Z  echo "Diskspace saved: $difference percent" 2025-09-07T06:39:31.8128054Z  fi 2025-09-07T06:39:31.8128229Z fi 2025-09-07T06:39:31.8165856Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:31.8166183Z env: 2025-09-07T06:39:31.8166359Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:31.8166579Z ##[endgroup] 2025-09-07T06:39:31.8224229Z + diskspace_cutoff=70 2025-09-07T06:39:31.8228457Z ++ docker info -f '{{.DockerRootDir}}' 2025-09-07T06:39:31.8796382Z + docker_root_dir=/media/4TB/docker-rootless 2025-09-07T06:39:31.8796823Z + '[' '!' -d /media/4TB/docker-rootless ']' 2025-09-07T06:39:31.8810787Z ++ df -H --output=pcent /media/4TB/docker-rootless 2025-09-07T06:39:31.8815039Z ++ sed -n 2p 2025-09-07T06:39:31.8815483Z ++ sed s/%// 2025-09-07T06:39:31.8818089Z ++ sed 's/ //' 2025-09-07T06:39:31.8844595Z + diskspace=33 2025-09-07T06:39:31.8845565Z + msg='Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified' 2025-09-07T06:39:31.8846567Z + [[ 33 -ge 70 ]] 2025-09-07T06:39:31.8878608Z ##[group]Run RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-09-07T06:39:31.8879013Z RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-09-07T06:39:31.8879328Z rm -rf "${RUNNER_ARTIFACT_DIR}" 2025-09-07T06:39:31.8879602Z mkdir -p "${RUNNER_ARTIFACT_DIR}" 2025-09-07T06:39:31.8879967Z echo "RUNNER_ARTIFACT_DIR=${RUNNER_ARTIFACT_DIR}" >> "${GITHUB_ENV}" 2025-09-07T06:39:31.8880308Z  2025-09-07T06:39:31.8880554Z RUNNER_TEST_RESULTS_DIR="${RUNNER_TEMP}/test-results" 2025-09-07T06:39:31.8880887Z rm -rf "${RUNNER_TEST_RESULTS_DIR}" 2025-09-07T06:39:31.8881210Z mkdir -p "${RUNNER_TEST_RESULTS_DIR}" 2025-09-07T06:39:31.8881595Z echo "RUNNER_TEST_RESULTS_DIR=${RUNNER_TEST_RESULTS_DIR}" >> "${GITHUB_ENV}" 2025-09-07T06:39:31.8881941Z  2025-09-07T06:39:31.8882128Z RUNNER_DOCS_DIR="${RUNNER_TEMP}/docs" 2025-09-07T06:39:31.8882385Z rm -rf "${RUNNER_DOCS_DIR}" 2025-09-07T06:39:31.8882633Z mkdir -p "${RUNNER_DOCS_DIR}" 2025-09-07T06:39:31.8882947Z echo "RUNNER_DOCS_DIR=${RUNNER_DOCS_DIR}" >> "${GITHUB_ENV}" 2025-09-07T06:39:31.8917513Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:31.8917840Z env: 2025-09-07T06:39:31.8918007Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:31.8918211Z ##[endgroup] 2025-09-07T06:39:31.9123069Z ##[group]Run env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:39:31.9123601Z env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:39:31.9124336Z env | grep '^CI' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:39:31.9162938Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:31.9163283Z env: 2025-09-07T06:39:31.9163470Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:31.9163847Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:31.9164398Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:31.9164887Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:31.9165234Z ##[endgroup] 2025-09-07T06:39:31.9303076Z ##[group]Run # All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py. 2025-09-07T06:39:31.9303775Z # All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py. 2025-09-07T06:39:31.9304248Z # Add render group for container creation. 2025-09-07T06:39:31.9304662Z render_gid=`cat /etc/group | grep render | cut -d: -f3` 2025-09-07T06:39:31.9305396Z # Ensure GPU isolation if pod is part of kubernetes setup with DEVICE_FLAG. 2025-09-07T06:39:31.9305871Z if [ -f "/etc/podinfo/gha-render-devices" ]; then 2025-09-07T06:39:31.9306272Z  DEVICE_FLAG=$(cat /etc/podinfo/gha-render-devices) 2025-09-07T06:39:31.9306602Z else 2025-09-07T06:39:31.9306818Z  DEVICE_FLAG="--device /dev/dri" 2025-09-07T06:39:31.9307072Z fi 2025-09-07T06:39:31.9307501Z # The --group-add daemon and --group-add bin are needed in the Ubuntu 24.04 and Almalinux OSs respectively. 2025-09-07T06:39:31.9308174Z # This is due to the device files (/dev/kfd & /dev/dri) being owned by video group on bare metal. 2025-09-07T06:39:31.9308813Z # This video group ID maps to subgid 1 inside the docker image due to the /etc/subgid entries. 2025-09-07T06:39:31.9309485Z # The group name corresponding to group ID 1 can change depending on the OS, so both are necessary. 2025-09-07T06:39:31.9310549Z echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd $DEVICE_FLAG --group-add video --group-add $render_gid --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host" >> "${GITHUB_ENV}" 2025-09-07T06:39:31.9348995Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:31.9349327Z env: 2025-09-07T06:39:31.9349499Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:31.9349862Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:31.9350379Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:31.9350899Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:31.9351248Z ##[endgroup] 2025-09-07T06:39:31.9517340Z ##[group]Run aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 2025-09-07T06:39:31.9517808Z with: 2025-09-07T06:39:31.9518144Z role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_s3_and_ecr_read_only 2025-09-07T06:39:31.9518528Z aws-region: us-east-1 2025-09-07T06:39:31.9518747Z role-duration-seconds: 18000 2025-09-07T06:39:31.9518999Z audience: sts.amazonaws.com 2025-09-07T06:39:31.9519210Z env: 2025-09-07T06:39:31.9519386Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:31.9519739Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:31.9520273Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:31.9520782Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:31.9521644Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:31.9522365Z ##[endgroup] 2025-09-07T06:39:32.2403435Z Assuming role with OIDC 2025-09-07T06:39:32.4151564Z Authenticated as assumedRoleId AROAUPVRELQNLLCOPFEJR:GitHubActions 2025-09-07T06:39:32.4755248Z ##[group]Run aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076 2025-09-07T06:39:32.4755704Z with: 2025-09-07T06:39:32.4755908Z mask-password: true 2025-09-07T06:39:32.4756149Z registry-type: private 2025-09-07T06:39:32.4756389Z skip-logout: false 2025-09-07T06:39:32.4756600Z env: 2025-09-07T06:39:32.4756784Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:32.4757158Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:32.4757700Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:32.4758213Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:32.4759068Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:32.4760111Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:32.4760414Z AWS_REGION: us-east-1 2025-09-07T06:39:32.4761334Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:32.4761716Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:32.4766743Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:32.4766969Z ##[endgroup] 2025-09-07T06:39:32.8837794Z Logging into registry 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:33.1751296Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-09-07T06:39:33.1751894Z with: 2025-09-07T06:39:33.1752733Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:33.1753648Z use-custom-docker-registry: true 2025-09-07T06:39:33.1754052Z docker-build-dir: .ci/docker 2025-09-07T06:39:33.1754441Z docker-build-script: ./build.sh 2025-09-07T06:39:33.1754915Z working-directory: . 2025-09-07T06:39:33.1755378Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:33.1755909Z force-push: false 2025-09-07T06:39:33.1756252Z env: 2025-09-07T06:39:33.1756579Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:33.1757118Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:33.1757737Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:33.1758665Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:33.1759680Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:33.1760568Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:33.1761011Z AWS_REGION: us-east-1 2025-09-07T06:39:33.1761480Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:33.1761918Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:33.1767209Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:33.1767532Z ##[endgroup] 2025-09-07T06:39:33.1795036Z ##[group]Run set -ex 2025-09-07T06:39:33.1795540Z set -ex 2025-09-07T06:39:33.1795851Z  2025-09-07T06:39:33.1796273Z # If the docker build directory or the build script doesn't exist, the action will 2025-09-07T06:39:33.1796855Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-09-07T06:39:33.1797349Z # job could then download the pre-built image as usual 2025-09-07T06:39:33.1797927Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-09-07T06:39:33.1798460Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-09-07T06:39:33.1798750Z else 2025-09-07T06:39:33.1798977Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-09-07T06:39:33.1799357Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:39:33.1799706Z  2025-09-07T06:39:33.1800187Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-09-07T06:39:33.1800723Z  exit 0 2025-09-07T06:39:33.1800915Z fi 2025-09-07T06:39:33.1801096Z  2025-09-07T06:39:33.1801392Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-09-07T06:39:33.1801903Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-09-07T06:39:33.1802372Z  # use it as it is, but first let's extract the tag 2025-09-07T06:39:33.1802781Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-09-07T06:39:33.1803213Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:39:33.1803633Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:39:33.1804216Z else 2025-09-07T06:39:33.1804447Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-09-07T06:39:33.1804775Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-09-07T06:39:33.1805105Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-09-07T06:39:33.1805384Z  fi 2025-09-07T06:39:33.1809249Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-09-07T06:39:33.1809802Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:39:33.1810339Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:39:33.1810915Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:39:33.1811274Z fi 2025-09-07T06:39:33.1852377Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:33.1852718Z env: 2025-09-07T06:39:33.1852923Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:33.1853292Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:33.1853913Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:33.1854421Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:33.1855260Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:33.1856010Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:33.1856265Z AWS_REGION: us-east-1 2025-09-07T06:39:33.1856551Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:33.1856883Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:33.1861871Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:33.1862111Z REPO_NAME: pytorch 2025-09-07T06:39:33.1862736Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:33.1863405Z DOCKER_BUILD_DIR: .ci/docker 2025-09-07T06:39:33.1863656Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-09-07T06:39:33.1863983Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:33.1864334Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-09-07T06:39:33.1864592Z CUSTOM_TAG_PREFIX: 2025-09-07T06:39:33.1864813Z ##[endgroup] 2025-09-07T06:39:33.1923989Z + [[ -d .ci/docker ]] 2025-09-07T06:39:33.1924303Z + [[ -f .ci/docker/./build.sh ]] 2025-09-07T06:39:33.1924596Z + [[ true == \t\r\u\e ]] 2025-09-07T06:39:33.1924842Z + echo skip=false 2025-09-07T06:39:33.1925791Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-09-07T06:39:33.1935606Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:33.1936742Z ++ awk -F '[:,]' '{print $2}' 2025-09-07T06:39:33.1970005Z + DOCKER_TAG=pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:33.1971093Z + echo docker-tag=pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:33.1972594Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:33.2004736Z ##[group]Run set +e 2025-09-07T06:39:33.2005019Z set +e 2025-09-07T06:39:33.2005225Z set -x 2025-09-07T06:39:33.2005413Z  2025-09-07T06:39:33.2005599Z login() { 2025-09-07T06:39:33.2006037Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-09-07T06:39:33.2006495Z } 2025-09-07T06:39:33.2006683Z  2025-09-07T06:39:33.2007111Z retry () { 2025-09-07T06:39:33.2007361Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-09-07T06:39:33.2007657Z } 2025-09-07T06:39:33.2007845Z  2025-09-07T06:39:33.2008053Z retry login "${DOCKER_REGISTRY}" 2025-09-07T06:39:33.2008322Z  2025-09-07T06:39:33.2008512Z START_TIME=$(date +%s) 2025-09-07T06:39:33.2008980Z # Wait up to 120 minutes 2025-09-07T06:39:33.2009308Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-09-07T06:39:33.2009730Z  # Check if image already exists, if it does then skip building it 2025-09-07T06:39:33.2010149Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-09-07T06:39:33.2010465Z  exit 0 2025-09-07T06:39:33.2010680Z  fi 2025-09-07T06:39:33.2010872Z  2025-09-07T06:39:33.2011205Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-09-07T06:39:33.2011779Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-09-07T06:39:33.2012327Z  # latter, it will wait for the Docker images to become available before continuing 2025-09-07T06:39:33.2012770Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-09-07T06:39:33.2013122Z  # It's a Docker build job, let's build the image 2025-09-07T06:39:33.2013430Z  break 2025-09-07T06:39:33.2013640Z  else 2025-09-07T06:39:33.2014018Z  # It's a regular build job, wait for the image to become available 2025-09-07T06:39:33.2014379Z  sleep 300 2025-09-07T06:39:33.2014596Z  fi 2025-09-07T06:39:33.2014789Z done 2025-09-07T06:39:33.2014976Z  2025-09-07T06:39:33.2015278Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-09-07T06:39:33.2015765Z # be empty. The default action would be to continue rebuild the image 2025-09-07T06:39:33.2016205Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-09-07T06:39:33.2016590Z  # if we're on the base branch then use the parent commit 2025-09-07T06:39:33.2016929Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-09-07T06:39:33.2017195Z else 2025-09-07T06:39:33.2017478Z  # otherwise we're on a PR, so use the most recent base commit 2025-09-07T06:39:33.2017880Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-09-07T06:39:33.2018194Z fi 2025-09-07T06:39:33.2018376Z  2025-09-07T06:39:33.2018589Z if [[ -z "${MERGE_BASE}" ]]; then 2025-09-07T06:39:33.2018897Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-09-07T06:39:33.2019182Z  2025-09-07T06:39:33.2019588Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-09-07T06:39:33.2020045Z  exit 0 2025-09-07T06:39:33.2020246Z fi 2025-09-07T06:39:33.2020426Z  2025-09-07T06:39:33.2020689Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-09-07T06:39:33.2021246Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-09-07T06:39:33.2021731Z  exit 1 2025-09-07T06:39:33.2021923Z fi 2025-09-07T06:39:33.2022120Z  2025-09-07T06:39:33.2022428Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-09-07T06:39:33.2022980Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-09-07T06:39:33.2023469Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-09-07T06:39:33.2024025Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-09-07T06:39:33.2024649Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-09-07T06:39:33.2025202Z fi 2025-09-07T06:39:33.2025387Z  2025-09-07T06:39:33.2025615Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-09-07T06:39:33.2061764Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:33.2062106Z env: 2025-09-07T06:39:33.2062322Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:33.2062885Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:33.2063456Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:33.2063958Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:33.2064795Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:33.2065561Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:33.2065822Z AWS_REGION: us-east-1 2025-09-07T06:39:33.2066132Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:33.2066480Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:33.2071447Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:33.2071685Z DOCKER_BUILD_DIR: .ci/docker 2025-09-07T06:39:33.2071989Z BASE_REVISION: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:39:33.2072705Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:33.2073521Z DOCKER_TAG: pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:33.2074032Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:33.2074369Z DOCKER_PUSH: 2025-09-07T06:39:33.2074577Z ##[endgroup] 2025-09-07T06:39:33.2134085Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:33.2134544Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:33.2139293Z + aws ecr get-login-password --region us-east-1 2025-09-07T06:39:33.2141083Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:34.1948031Z WARNING! Your password will be stored unencrypted in /var/home/pytorchci/.docker/config.json. 2025-09-07T06:39:34.1949107Z Configure a credential helper to remove this warning. See 2025-09-07T06:39:34.1949998Z https://docs.docker.com/engine/reference/commandline/login/#credential-stores 2025-09-07T06:39:34.1950639Z 2025-09-07T06:39:34.1951631Z Login Succeeded 2025-09-07T06:39:34.1989839Z ++ date +%s 2025-09-07T06:39:34.2003562Z + START_TIME=1757227174 2025-09-07T06:39:34.2010379Z ++ date +%s 2025-09-07T06:39:34.2025541Z + [[ 1757219974 -lt 1757227174 ]] 2025-09-07T06:39:34.2026359Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:34.8617257Z { 2025-09-07T06:39:34.8617634Z "schemaVersion": 2, 2025-09-07T06:39:34.8618141Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-09-07T06:39:34.8618645Z "config": { 2025-09-07T06:39:34.8619061Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-09-07T06:39:34.8619488Z "size": 28694, 2025-09-07T06:39:34.8619915Z "digest": "sha256:286241da8837146a38c2d15035dbd9c40a82a02e849dd96783e9e483017209fc" 2025-09-07T06:39:34.8620416Z }, 2025-09-07T06:39:34.8620634Z "layers": [ 2025-09-07T06:39:34.8620848Z { 2025-09-07T06:39:34.8621208Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8621654Z "size": 30448359, 2025-09-07T06:39:34.8622103Z "digest": "sha256:e6fdc8487bfe6d764301ef3634bc6c043841dc3ab05ca14f81e69c0f92562d46" 2025-09-07T06:39:34.8622592Z }, 2025-09-07T06:39:34.8622807Z { 2025-09-07T06:39:34.8623201Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8623683Z "size": 1554, 2025-09-07T06:39:34.8624363Z "digest": "sha256:316899e31961f856e0f2bd0fe5694db576cf915de748322bde5b9807fb9141ac" 2025-09-07T06:39:34.8624822Z }, 2025-09-07T06:39:34.8625012Z { 2025-09-07T06:39:34.8625321Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8625724Z "size": 313297866, 2025-09-07T06:39:34.8626131Z "digest": "sha256:4de005a5e0959467985d4fc5155eda7af45ad760aacd5953951481e2997a146c" 2025-09-07T06:39:34.8626768Z }, 2025-09-07T06:39:34.8626982Z { 2025-09-07T06:39:34.8627316Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8627740Z "size": 704, 2025-09-07T06:39:34.8628164Z "digest": "sha256:167eb56e0e1e0549ded83febeb966920d3a663bf54639bc7f6310e90fdc4f345" 2025-09-07T06:39:34.8628636Z }, 2025-09-07T06:39:34.8628827Z { 2025-09-07T06:39:34.8629140Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8629532Z "size": 1219, 2025-09-07T06:39:34.8629916Z "digest": "sha256:02e451948e19451d4f23e03e19f3189a0e5211f4540a8853328b8661261408ba" 2025-09-07T06:39:34.8630365Z }, 2025-09-07T06:39:34.8630555Z { 2025-09-07T06:39:34.8630866Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8631264Z "size": 485, 2025-09-07T06:39:34.8631648Z "digest": "sha256:76c661101f33e47a67ddf88187ac81725450656b25ce1628c44c3cd142a8a0a0" 2025-09-07T06:39:34.8632092Z }, 2025-09-07T06:39:34.8632279Z { 2025-09-07T06:39:34.8632587Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8632985Z "size": 110343593, 2025-09-07T06:39:34.8633374Z "digest": "sha256:b69885b6894e5c507bc07d6a968d9e7c8737263330181813ac670d47db9fae39" 2025-09-07T06:39:34.8633788Z }, 2025-09-07T06:39:34.8633958Z { 2025-09-07T06:39:34.8634239Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8634603Z "size": 4211, 2025-09-07T06:39:34.8634949Z "digest": "sha256:6e11c09f28914a8e56fdf6763adf2ce72725ae0a730f4806b21dc52d88da554d" 2025-09-07T06:39:34.8635398Z }, 2025-09-07T06:39:34.8635583Z { 2025-09-07T06:39:34.8635876Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8636231Z "size": 1709, 2025-09-07T06:39:34.8636577Z "digest": "sha256:4c8d96980a58f8a721c371b44f6436a6d926f8a1b010ca41d6c1373d3191fc05" 2025-09-07T06:39:34.8636964Z }, 2025-09-07T06:39:34.8637134Z { 2025-09-07T06:39:34.8637415Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8637757Z "size": 724, 2025-09-07T06:39:34.8638092Z "digest": "sha256:818552e09965a162f3b2717caec1c0bd350b438f27bcd1aa017c3631f3d86aa1" 2025-09-07T06:39:34.8638474Z }, 2025-09-07T06:39:34.8638627Z { 2025-09-07T06:39:34.8638896Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8639241Z "size": 3241446542, 2025-09-07T06:39:34.8639595Z "digest": "sha256:d41c3ca7aea9e00e6551873a813eae016e15e041f7e91d6d4d59186ecd2f273a" 2025-09-07T06:39:34.8639994Z }, 2025-09-07T06:39:34.8640160Z { 2025-09-07T06:39:34.8640429Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8640770Z "size": 380, 2025-09-07T06:39:34.8641109Z "digest": "sha256:e1ab50b7b361f99f5aeb79592525d1ed2c77ebc6c395c78dde5ade730393129b" 2025-09-07T06:39:34.8641490Z }, 2025-09-07T06:39:34.8641644Z { 2025-09-07T06:39:34.8641913Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8642254Z "size": 235279, 2025-09-07T06:39:34.8642591Z "digest": "sha256:7590e23c16ed382e0d34bceb76b2629d1c32bf6a541321455995388e88ff2ee0" 2025-09-07T06:39:34.8642966Z }, 2025-09-07T06:39:34.8643128Z { 2025-09-07T06:39:34.8643389Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8643722Z "size": 791, 2025-09-07T06:39:34.8644061Z "digest": "sha256:e04fdb0d2cf2b4d50e36ff38815c9662afa93af8c2f6654e290b8205d0694f1b" 2025-09-07T06:39:34.8644462Z }, 2025-09-07T06:39:34.8644810Z { 2025-09-07T06:39:34.8645066Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8645399Z "size": 106, 2025-09-07T06:39:34.8645750Z "digest": "sha256:75b382e1cce01447d0ec442543e32d434c80ce08c6a7dad4d51feea1d252db79" 2025-09-07T06:39:34.8646134Z }, 2025-09-07T06:39:34.8646295Z { 2025-09-07T06:39:34.8646561Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8647034Z "size": 1495, 2025-09-07T06:39:34.8647368Z "digest": "sha256:4757ad898455b30c6bb94908a4f53ef3509a750de04237762e40d4933479d8b2" 2025-09-07T06:39:34.8647742Z }, 2025-09-07T06:39:34.8647910Z { 2025-09-07T06:39:34.8648170Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8648505Z "size": 454405422, 2025-09-07T06:39:34.8648855Z "digest": "sha256:669a45c95698fcbd5c62bdb2286ccf82b1225f791dbe6fc37c6f001605187b67" 2025-09-07T06:39:34.8649237Z }, 2025-09-07T06:39:34.8649400Z { 2025-09-07T06:39:34.8649668Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8650002Z "size": 163, 2025-09-07T06:39:34.8650337Z "digest": "sha256:c98637e362e4cc0727f9d1cb21414abb98ca18acf9389c9ed00c0ef0ebad026e" 2025-09-07T06:39:34.8650732Z }, 2025-09-07T06:39:34.8667848Z { 2025-09-07T06:39:34.8668196Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8668641Z "size": 2484, 2025-09-07T06:39:34.8669078Z "digest": "sha256:db28c0872d6878c34e09824a7a63546e86f6307bcc1525391565cf04ad58a391" 2025-09-07T06:39:34.8669488Z }, 2025-09-07T06:39:34.8669653Z { 2025-09-07T06:39:34.8669945Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8670311Z "size": 8102122719, 2025-09-07T06:39:34.8670676Z "digest": "sha256:2665bdff0cdf89b4e861653b9d587a2aa7ede0480d071064a67bf09e3a961a12" 2025-09-07T06:39:34.8671065Z }, 2025-09-07T06:39:34.8671232Z { 2025-09-07T06:39:34.8671513Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8671865Z "size": 105, 2025-09-07T06:39:34.8672213Z "digest": "sha256:293238514daa4b5847043bf5365fee1b28a31a5eb2c577e42d361ddf26003d7d" 2025-09-07T06:39:34.8672592Z }, 2025-09-07T06:39:34.8672756Z { 2025-09-07T06:39:34.8673019Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8673358Z "size": 611, 2025-09-07T06:39:34.8673705Z "digest": "sha256:3fa586a8f58967b58bc6fdce55ff1e59fca8aa183a369ec1b8488841d983100e" 2025-09-07T06:39:34.8674097Z }, 2025-09-07T06:39:34.8674259Z { 2025-09-07T06:39:34.8674525Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8674866Z "size": 677677416, 2025-09-07T06:39:34.8675222Z "digest": "sha256:acb80e5ab68ea4fe53ae0e494c899f1ee1bb679f672de45718e57ed0b46593a5" 2025-09-07T06:39:34.8675608Z }, 2025-09-07T06:39:34.8675768Z { 2025-09-07T06:39:34.8676027Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8676366Z "size": 111, 2025-09-07T06:39:34.8676698Z "digest": "sha256:d88918208ec0ea95f09bdb4c327f19a67221711c6b3bdc3808fa21a3b8c82f01" 2025-09-07T06:39:34.8677076Z }, 2025-09-07T06:39:34.8677237Z { 2025-09-07T06:39:34.8677508Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8677845Z "size": 1555, 2025-09-07T06:39:34.8678188Z "digest": "sha256:f3c90fedb003713b60dec8714d41737ed0536fbdd818071b1784ed4514f60d4a" 2025-09-07T06:39:34.8678563Z }, 2025-09-07T06:39:34.8678728Z { 2025-09-07T06:39:34.8678994Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8679316Z "size": 107, 2025-09-07T06:39:34.8679652Z "digest": "sha256:cb1b2a5b0ca3689ad12d0b331407f78eb0c5b7175fd3ad519c42c93ba1319d08" 2025-09-07T06:39:34.8680036Z }, 2025-09-07T06:39:34.8680199Z { 2025-09-07T06:39:34.8680462Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8680799Z "size": 166, 2025-09-07T06:39:34.8681391Z "digest": "sha256:483a7916ae75df6f57b30ef56ef0ae9efaa151ebdfdea0190dc6a11ed869926a" 2025-09-07T06:39:34.8681782Z }, 2025-09-07T06:39:34.8681944Z { 2025-09-07T06:39:34.8682213Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8682549Z "size": 2838603, 2025-09-07T06:39:34.8682876Z "digest": "sha256:4e75f953b1a90095bf9d382f69451aaa954ee84b2799f5d3205eed588c49e7bf" 2025-09-07T06:39:34.8687855Z }, 2025-09-07T06:39:34.8688057Z { 2025-09-07T06:39:34.8688333Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8688676Z "size": 107, 2025-09-07T06:39:34.8689007Z "digest": "sha256:ede399652893d42a816ec89517351552b6689b8b7b9728e6086973f09ac2968f" 2025-09-07T06:39:34.8689387Z }, 2025-09-07T06:39:34.8689547Z { 2025-09-07T06:39:34.8689824Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8690175Z "size": 828, 2025-09-07T06:39:34.8690522Z "digest": "sha256:bb5cfc8fb254f12e61fa41ff8cd324b88d4f7245cfc40a4c559ef608236baa64" 2025-09-07T06:39:34.8690922Z }, 2025-09-07T06:39:34.8691084Z { 2025-09-07T06:39:34.8691357Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8691699Z "size": 26113875, 2025-09-07T06:39:34.8692048Z "digest": "sha256:31baf740473d67bb71a36e3df4bae5f509179625f19a4b576796bef96ede450f" 2025-09-07T06:39:34.8692433Z }, 2025-09-07T06:39:34.8692590Z { 2025-09-07T06:39:34.8692857Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8693190Z "size": 104, 2025-09-07T06:39:34.8693515Z "digest": "sha256:c97044f386490ca8c0b8710276a111f54f24a0ac1ef14992b853946245dd1aca" 2025-09-07T06:39:34.8693994Z }, 2025-09-07T06:39:34.8694161Z { 2025-09-07T06:39:34.8694426Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8694760Z "size": 425, 2025-09-07T06:39:34.8695092Z "digest": "sha256:3440831c489950e99dbfa45bf9088a78ad9c6f818e4fad454468f52b81c1b4af" 2025-09-07T06:39:34.8695483Z }, 2025-09-07T06:39:34.8695652Z { 2025-09-07T06:39:34.8695906Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8696244Z "size": 19309396, 2025-09-07T06:39:34.8696587Z "digest": "sha256:2a0ea75b5c44f692424382528970b36d1870ed0a14f431f652947049717c12ac" 2025-09-07T06:39:34.8696965Z }, 2025-09-07T06:39:34.8697128Z { 2025-09-07T06:39:34.8697400Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8697736Z "size": 638, 2025-09-07T06:39:34.8698067Z "digest": "sha256:5637084b7653318beeb2994d91777e6061fb154f7c6fd4386df2b21a85fff9d1" 2025-09-07T06:39:34.8698436Z }, 2025-09-07T06:39:34.8698593Z { 2025-09-07T06:39:34.8698855Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8699184Z "size": 724, 2025-09-07T06:39:34.8699507Z "digest": "sha256:818552e09965a162f3b2717caec1c0bd350b438f27bcd1aa017c3631f3d86aa1" 2025-09-07T06:39:34.8699885Z }, 2025-09-07T06:39:34.8700045Z { 2025-09-07T06:39:34.8700310Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8700646Z "size": 148, 2025-09-07T06:39:34.8700982Z "digest": "sha256:90adaedf0a682dd3f6bb107dd10326355d7c473947209e8da5885e8fa3320d47" 2025-09-07T06:39:34.8701369Z }, 2025-09-07T06:39:34.8701535Z { 2025-09-07T06:39:34.8701807Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8702164Z "size": 136, 2025-09-07T06:39:34.8702493Z "digest": "sha256:0f06ece68ed5f9f41f9408bf6fd4db2c9e03217feeb715540cac185bd5cfc9c5" 2025-09-07T06:39:34.8702872Z }, 2025-09-07T06:39:34.8703032Z { 2025-09-07T06:39:34.8703293Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8703635Z "size": 140, 2025-09-07T06:39:34.8703969Z "digest": "sha256:2febd7b3aa97983a272c092c40b8317d868888cf9ad2403abf1ff5e3d14b14a8" 2025-09-07T06:39:34.8704351Z }, 2025-09-07T06:39:34.8704692Z { 2025-09-07T06:39:34.8704959Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8705295Z "size": 32, 2025-09-07T06:39:34.8705635Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T06:39:34.8706013Z }, 2025-09-07T06:39:34.8706174Z { 2025-09-07T06:39:34.8706436Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8706937Z "size": 223, 2025-09-07T06:39:34.8707274Z "digest": "sha256:8951084017c9b004c2c916d622ce96d84f46bb950bd25382ae9b8d83e64f93fa" 2025-09-07T06:39:34.8707652Z }, 2025-09-07T06:39:34.8707813Z { 2025-09-07T06:39:34.8708079Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8708422Z "size": 347, 2025-09-07T06:39:34.8708752Z "digest": "sha256:44b36041d1d12a66c93f5d9d63817c683264ed44fa4fb9eed44f6a6ebe6c494c" 2025-09-07T06:39:34.8709131Z }, 2025-09-07T06:39:34.8709285Z { 2025-09-07T06:39:34.8709548Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8709886Z "size": 88302, 2025-09-07T06:39:34.8710218Z "digest": "sha256:2be286176fa8f15f3001e7d3b77f6fa7286387993b879f183561c427531885bb" 2025-09-07T06:39:34.8710585Z }, 2025-09-07T06:39:34.8710746Z { 2025-09-07T06:39:34.8711086Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8711435Z "size": 106, 2025-09-07T06:39:34.8711806Z "digest": "sha256:aa0da64a2ba108a9342f5b14dc116110a04c2887fa56ea3dc16558b2759ba903" 2025-09-07T06:39:34.8712198Z }, 2025-09-07T06:39:34.8712354Z { 2025-09-07T06:39:34.8712615Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8712951Z "size": 1665, 2025-09-07T06:39:34.8713291Z "digest": "sha256:fde5747829cd71484377ab56a0407b22ce1a42c6cbf56371776339846f0b2195" 2025-09-07T06:39:34.8713666Z }, 2025-09-07T06:39:34.8713828Z { 2025-09-07T06:39:34.8714091Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8714454Z "size": 724, 2025-09-07T06:39:34.8714790Z "digest": "sha256:818552e09965a162f3b2717caec1c0bd350b438f27bcd1aa017c3631f3d86aa1" 2025-09-07T06:39:34.8715179Z }, 2025-09-07T06:39:34.8715341Z { 2025-09-07T06:39:34.8715600Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8715956Z "size": 137, 2025-09-07T06:39:34.8716342Z "digest": "sha256:eb7bb3fc4d9623d18dfa223e6cb46cf2f3df0a59598d851ac472637fe722677c" 2025-09-07T06:39:34.8716860Z }, 2025-09-07T06:39:34.8717022Z { 2025-09-07T06:39:34.8717285Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8717621Z "size": 120, 2025-09-07T06:39:34.8717951Z "digest": "sha256:d9b121aab70c0d3e546ffe9b158656a44d9c6d3e569d6dc9a8ee5d921d888bd3" 2025-09-07T06:39:34.8718341Z }, 2025-09-07T06:39:34.8718504Z { 2025-09-07T06:39:34.8718766Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8719107Z "size": 5394477445, 2025-09-07T06:39:34.8719466Z "digest": "sha256:6c6fc404c6828fe7027e828fe563d62f1c16e678c3f9a51464fce04239bb9488" 2025-09-07T06:39:34.8719840Z }, 2025-09-07T06:39:34.8720015Z { 2025-09-07T06:39:34.8720292Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8720626Z "size": 176, 2025-09-07T06:39:34.8720952Z "digest": "sha256:9260e87d9803c7cb4a0dedb3b51785bbbe62cb5ca034284ec02742d5049e5f9c" 2025-09-07T06:39:34.8721322Z }, 2025-09-07T06:39:34.8721474Z { 2025-09-07T06:39:34.8721731Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8722060Z "size": 1896, 2025-09-07T06:39:34.8722408Z "digest": "sha256:c659a448d749ad3a3e065be7ee6f45bfde2d94cc4ef8301edd60ababcf6bd99d" 2025-09-07T06:39:34.8722791Z }, 2025-09-07T06:39:34.8722945Z { 2025-09-07T06:39:34.8723208Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8723549Z "size": 197530991, 2025-09-07T06:39:34.8724047Z "digest": "sha256:a27e57a7638d731b698c4baa57fabdaabfab6bb499327288ad8b3165055e90d0" 2025-09-07T06:39:34.8724431Z }, 2025-09-07T06:39:34.8724594Z { 2025-09-07T06:39:34.8724859Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8725196Z "size": 303, 2025-09-07T06:39:34.8725515Z "digest": "sha256:8806695cc47c4091164172531656ba28547f548b3852f2c65283b0dd86ee856b" 2025-09-07T06:39:34.8726015Z }, 2025-09-07T06:39:34.8726174Z { 2025-09-07T06:39:34.8726444Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8726779Z "size": 32, 2025-09-07T06:39:34.8727116Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T06:39:34.8727499Z }, 2025-09-07T06:39:34.8727657Z { 2025-09-07T06:39:34.8727920Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8728260Z "size": 108, 2025-09-07T06:39:34.8728591Z "digest": "sha256:3f90d72fbf68038c3bac5fcc798db7c7eba224a4d6ce29dc040eac495bc540ee" 2025-09-07T06:39:34.8728982Z }, 2025-09-07T06:39:34.8729151Z { 2025-09-07T06:39:34.8729407Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:39:34.8729743Z "size": 54145699, 2025-09-07T06:39:34.8730083Z "digest": "sha256:5f64499ff3f138052a072850343ead5ef35552dc184ce2c46ba3cc83277e50ba" 2025-09-07T06:39:34.8730456Z } 2025-09-07T06:39:34.8730628Z ] 2025-09-07T06:39:34.8730793Z } 2025-09-07T06:39:34.8730963Z + exit 0 2025-09-07T06:39:34.8757640Z ##[group]Run set -eux 2025-09-07T06:39:34.8757906Z set -eux 2025-09-07T06:39:34.8758259Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-09-07T06:39:34.8759221Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-09-07T06:39:34.8798584Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:34.8798912Z env: 2025-09-07T06:39:34.8799106Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:34.8799478Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:34.8800024Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:34.8800528Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:34.8801362Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:34.8802105Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:34.8802357Z AWS_REGION: us-east-1 2025-09-07T06:39:34.8802680Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:34.8802999Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:34.8807985Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:34.8808202Z ##[endgroup] 2025-09-07T06:39:34.8870712Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-09-07T06:39:34.8872285Z + jq --raw-output .SecretString 2025-09-07T06:39:34.8874732Z + jq -r .docker_hub_readonly_token 2025-09-07T06:39:34.8877908Z + docker login --username pytorchbot --password-stdin 2025-09-07T06:39:35.5258637Z 2025-09-07T06:39:35.5260854Z An error occurred (AccessDeniedException) when calling the GetSecretValue operation: User: arn:aws:sts::308535385114:assumed-role/gha_workflow_s3_and_ecr_read_only/GitHubActions is not authorized to perform: secretsmanager:GetSecretValue on resource: docker_hub_readonly_token because no identity-based policy allows the secretsmanager:GetSecretValue action 2025-09-07T06:39:35.6281751Z Error: Cannot perform an interactive login from a non TTY device 2025-09-07T06:39:35.6309293Z + true 2025-09-07T06:39:35.6400778Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-09-07T06:39:35.6401180Z with: 2025-09-07T06:39:35.6402049Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:35.6402787Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:35.6403120Z env: 2025-09-07T06:39:35.6403311Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:35.6403678Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:35.6404214Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:35.6404715Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:35.6405584Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:35.6406341Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:35.6406583Z AWS_REGION: us-east-1 2025-09-07T06:39:35.6406914Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:35.6407237Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:35.6412210Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:35.6412434Z ##[endgroup] 2025-09-07T06:39:35.6427345Z ##[group]Run set -x 2025-09-07T06:39:35.6427605Z set -x 2025-09-07T06:39:35.6427804Z set +e 2025-09-07T06:39:35.6427982Z  2025-09-07T06:39:35.6428163Z login() { 2025-09-07T06:39:35.6428576Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-09-07T06:39:35.6429016Z } 2025-09-07T06:39:35.6429191Z  2025-09-07T06:39:35.6429377Z retry () { 2025-09-07T06:39:35.6429609Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-09-07T06:39:35.6429877Z } 2025-09-07T06:39:35.6430054Z  2025-09-07T06:39:35.6430259Z retry login "${DOCKER_REGISTRY}" 2025-09-07T06:39:35.6430517Z  2025-09-07T06:39:35.6430915Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-09-07T06:39:35.6431459Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-09-07T06:39:35.6431770Z  2025-09-07T06:39:35.6431948Z set -e 2025-09-07T06:39:35.6432235Z # ignore output since only exit code is used for conditional 2025-09-07T06:39:35.6432636Z # only pull docker image if it's not available locally 2025-09-07T06:39:35.6433087Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-09-07T06:39:35.6433501Z  retry docker pull "${DOCKER_IMAGE}" 2025-09-07T06:39:35.6433767Z fi 2025-09-07T06:39:35.6471770Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:35.6472103Z env: 2025-09-07T06:39:35.6472300Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:35.6472668Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:35.6473234Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:35.6473764Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:35.6474604Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:35.6475350Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:35.6475605Z AWS_REGION: us-east-1 2025-09-07T06:39:35.6475881Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:35.6476199Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:35.6481164Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:35.6482008Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:35.6482757Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:35.6483089Z ##[endgroup] 2025-09-07T06:39:35.6540588Z + set +e 2025-09-07T06:39:35.6540928Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:35.6541361Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:35.6545828Z + aws ecr get-login-password --region us-east-1 2025-09-07T06:39:35.6547332Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:39:36.6607589Z WARNING! Your password will be stored unencrypted in /var/home/pytorchci/.docker/config.json. 2025-09-07T06:39:36.6608753Z Configure a credential helper to remove this warning. See 2025-09-07T06:39:36.6609769Z https://docs.docker.com/engine/reference/commandline/login/#credential-stores 2025-09-07T06:39:36.6610425Z 2025-09-07T06:39:36.6615918Z Login Succeeded 2025-09-07T06:39:36.6659884Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:36.6664276Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-09-07T06:39:37.3705760Z + IMAGE_SIZE=17761.743545532227 2025-09-07T06:39:37.3706142Z + echo 'Compressed size of image in MB: 17761.743545532227' 2025-09-07T06:39:37.3706493Z + set -e 2025-09-07T06:39:37.3706742Z Compressed size of image in MB: 17761.743545532227 2025-09-07T06:39:37.3707548Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:39:37.3978934Z Prepare all required actions 2025-09-07T06:39:37.4009177Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-09-07T06:39:37.4009503Z with: 2025-09-07T06:39:37.4010027Z github-token: *** 2025-09-07T06:39:37.4010252Z env: 2025-09-07T06:39:37.4010452Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:37.4010827Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:37.4011393Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:37.4011939Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:37.4012799Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:37.4013563Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:37.4013912Z AWS_REGION: us-east-1 2025-09-07T06:39:37.4014292Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:37.4014625Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:37.4019620Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:37.4019849Z ##[endgroup] 2025-09-07T06:39:37.4034812Z ##[group]Run set -eux 2025-09-07T06:39:37.4035067Z set -eux 2025-09-07T06:39:37.4035465Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-09-07T06:39:37.4074513Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:37.4074885Z env: 2025-09-07T06:39:37.4075103Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:37.4075476Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:37.4076025Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:37.4076537Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:37.4077390Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:37.4078153Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:37.4078416Z AWS_REGION: us-east-1 2025-09-07T06:39:37.4078705Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:37.4079182Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:37.4084185Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:37.4084527Z GITHUB_TOKEN: *** 2025-09-07T06:39:37.4084753Z ##[endgroup] 2025-09-07T06:39:37.4159893Z + python3 .github/scripts/get_workflow_job_id.py 17524754569 gpu6c07 2025-09-07T06:39:38.1412385Z Setting output job-id=49774352868 2025-09-07T06:39:38.1413233Z Setting output job-name=linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm) 2025-09-07T06:39:38.1599421Z Prepare all required actions 2025-09-07T06:39:38.1599813Z Getting action download info 2025-09-07T06:39:38.2943536Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-09-07T06:39:38.8692403Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-09-07T06:39:39.3467454Z ##[group]Run ./.github/actions/download-build-artifacts 2025-09-07T06:39:39.3467777Z with: 2025-09-07T06:39:39.3467995Z name: linux-jammy-rocm-py3.10 2025-09-07T06:39:39.3468252Z s3-bucket: gha-artifacts 2025-09-07T06:39:39.3468472Z env: 2025-09-07T06:39:39.3468657Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:39.3469041Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:39.3469642Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:39.3470183Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:39.3471056Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:39.3471814Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:39.3472054Z AWS_REGION: us-east-1 2025-09-07T06:39:39.3472347Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:39.3472672Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:39.3477649Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:39.3477867Z ##[endgroup] 2025-09-07T06:39:39.3506704Z ##[group]Run seemethere/download-artifact-s3@v4 2025-09-07T06:39:39.3507005Z with: 2025-09-07T06:39:39.3507218Z name: linux-jammy-rocm-py3.10 2025-09-07T06:39:39.3507501Z s3-bucket: gha-artifacts 2025-09-07T06:39:39.3507738Z region: us-east-1 2025-09-07T06:39:39.3507947Z env: 2025-09-07T06:39:39.3508133Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:39.3508506Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:39.3509044Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:39.3509557Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:39.3510419Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:39.3511183Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:39.3511442Z AWS_REGION: us-east-1 2025-09-07T06:39:39.3511735Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:39.3512072Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:39.3517067Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:39.3517304Z ##[endgroup] 2025-09-07T06:39:39.8377343Z (node:748767) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-09-07T06:39:39.8378042Z 2025-09-07T06:39:39.8378383Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-09-07T06:39:39.8379179Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-09-07T06:39:39.8379997Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-09-07T06:39:40.0253694Z Found 1 objects with prefix pytorch/pytorch/17524754569/linux-jammy-rocm-py3.10/ 2025-09-07T06:39:40.0254980Z Starting download (1/1): /var/home/pytorchci/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-09-07T06:39:49.7978488Z Finished download (1/1): /var/home/pytorchci/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-09-07T06:39:49.7985015Z Artifact download has finished successfully 2025-09-07T06:39:49.8377185Z ##[group]Run unzip -o artifacts.zip 2025-09-07T06:39:49.8377770Z unzip -o artifacts.zip 2025-09-07T06:39:49.8416505Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:39:49.8416861Z env: 2025-09-07T06:39:49.8417071Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:39:49.8417754Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:39:49.8418350Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:39:49.8418866Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:39:49.8419735Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:39:49.8420492Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:39:49.8420742Z AWS_REGION: us-east-1 2025-09-07T06:39:49.8421045Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:39:49.8421366Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:39:49.8426368Z AWS_SESSION_TOKEN: *** 2025-09-07T06:39:49.8426589Z ##[endgroup] 2025-09-07T06:39:49.8506422Z Archive: artifacts.zip 2025-09-07T06:39:49.8507317Z creating: dist/ 2025-09-07T06:39:52.8634517Z inflating: dist/torch-2.9.0a0+git93fb23d-cp310-cp310-linux_x86_64.whl 2025-09-07T06:39:52.8788474Z inflating: dist/.ninja_log 2025-09-07T06:39:52.8789092Z creating: build/custom_test_artifacts/ 2025-09-07T06:39:52.8789697Z creating: build/custom_test_artifacts/custom-op-build/ 2025-09-07T06:39:52.8790428Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-09-07T06:39:52.8791289Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-09-07T06:39:52.8795181Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T06:39:52.8795740Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/ 2025-09-07T06:39:52.8796306Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T06:39:52.8796910Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T06:39:52.8797479Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T06:39:52.8799668Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T06:39:52.8801317Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T06:39:52.8802231Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T06:39:52.8802846Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T06:39:52.8803425Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T06:39:52.8805748Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T06:39:52.8807282Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T06:39:52.8808353Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T06:39:52.8809874Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T06:39:52.8811427Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T06:39:52.8812068Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-09-07T06:39:52.8812577Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-09-07T06:39:52.8813110Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-09-07T06:39:52.8813670Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-09-07T06:39:52.8814553Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-09-07T06:39:52.8815668Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-09-07T06:39:52.8816605Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-09-07T06:39:52.8817283Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-09-07T06:39:52.8817936Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-09-07T06:39:52.8818600Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-09-07T06:39:52.8819239Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-09-07T06:39:52.8819879Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-09-07T06:39:52.8820515Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-09-07T06:39:52.8842587Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-09-07T06:39:52.9067437Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-09-07T06:39:52.9068471Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-09-07T06:39:52.9069582Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-09-07T06:39:52.9070840Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-09-07T06:39:52.9072070Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-09-07T06:39:52.9073190Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-09-07T06:39:52.9074345Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-09-07T06:39:52.9075306Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-09-07T06:39:52.9075979Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-09-07T06:39:52.9076655Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-09-07T06:39:52.9077311Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-09-07T06:39:52.9097229Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-09-07T06:39:52.9186634Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-09-07T06:39:52.9187869Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T06:39:52.9189016Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-09-07T06:39:52.9190062Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-09-07T06:39:52.9191021Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-09-07T06:39:52.9191932Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-09-07T06:39:52.9192900Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/InstallScripts.json 2025-09-07T06:39:52.9193876Z inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_outer_vec.cc 2025-09-07T06:39:52.9194806Z inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_vec_ext.cc 2025-09-07T06:39:52.9195502Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-09-07T06:39:52.9195983Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-09-07T06:39:52.9196660Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-09-07T06:39:52.9384312Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-09-07T06:39:52.9445060Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-09-07T06:39:52.9445826Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-09-07T06:39:52.9446513Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-09-07T06:39:52.9447349Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-09-07T06:39:52.9450714Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T06:39:52.9451660Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/ 2025-09-07T06:39:52.9452566Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T06:39:52.9453545Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T06:39:52.9454610Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T06:39:52.9455711Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T06:39:52.9456548Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T06:39:52.9457176Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T06:39:52.9457776Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T06:39:52.9458346Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T06:39:52.9460633Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T06:39:52.9462091Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T06:39:52.9463045Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T06:39:52.9464651Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T06:39:52.9466197Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T06:39:52.9466828Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-09-07T06:39:52.9467327Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-09-07T06:39:52.9467848Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-09-07T06:39:52.9468409Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-09-07T06:39:52.9469031Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-09-07T06:39:52.9469744Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-09-07T06:39:52.9470448Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-09-07T06:39:52.9471076Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-09-07T06:39:52.9471727Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-09-07T06:39:52.9472388Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-09-07T06:39:52.9473052Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-09-07T06:39:52.9473696Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-09-07T06:39:52.9474341Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-09-07T06:39:52.9497443Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-09-07T06:39:52.9568037Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-09-07T06:39:52.9569259Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T06:39:52.9570374Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-09-07T06:39:52.9571359Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-09-07T06:39:52.9572268Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-09-07T06:39:52.9573150Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-09-07T06:39:52.9574205Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/InstallScripts.json 2025-09-07T06:39:52.9575178Z inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_outer_vec.cc 2025-09-07T06:39:52.9576094Z inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_vec_ext.cc 2025-09-07T06:39:52.9576676Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-09-07T06:39:52.9577138Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-09-07T06:39:52.9577596Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-09-07T06:39:52.9618371Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-09-07T06:39:52.9619136Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-09-07T06:39:52.9619876Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-09-07T06:39:52.9620756Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-09-07T06:39:52.9624335Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T06:39:52.9625378Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/ 2025-09-07T06:39:52.9626376Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T06:39:52.9627111Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T06:39:52.9627717Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T06:39:52.9628436Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T06:39:52.9629921Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T06:39:52.9630615Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T06:39:52.9631265Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T06:39:52.9631900Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T06:39:52.9634096Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T06:39:52.9635599Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T06:39:52.9636565Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T06:39:52.9638194Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T06:39:52.9639732Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T06:39:52.9640415Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-09-07T06:39:52.9640956Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-09-07T06:39:52.9641722Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-09-07T06:39:52.9642496Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-09-07T06:39:52.9643187Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-09-07T06:39:52.9643950Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-09-07T06:39:52.9644688Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-09-07T06:39:52.9645379Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-09-07T06:39:52.9646098Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-09-07T06:39:52.9646825Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-09-07T06:39:52.9647548Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-09-07T06:39:52.9648270Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-09-07T06:39:52.9648971Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-09-07T06:39:52.9651719Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-09-07T06:39:52.9785554Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-09-07T06:39:52.9787014Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-09-07T06:39:52.9788372Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-09-07T06:39:52.9789766Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-09-07T06:39:52.9791087Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-09-07T06:39:52.9792321Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-09-07T06:39:52.9793596Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-09-07T06:39:52.9794876Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-09-07T06:39:52.9796170Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-09-07T06:39:52.9797004Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-09-07T06:39:52.9797743Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-09-07T06:39:52.9815408Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-09-07T06:39:52.9875340Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-09-07T06:39:52.9876170Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T06:39:52.9877358Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-09-07T06:39:52.9878423Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-09-07T06:39:52.9879407Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-09-07T06:39:52.9880366Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-09-07T06:39:52.9881738Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/InstallScripts.json 2025-09-07T06:39:52.9883027Z inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_outer_vec.cc 2025-09-07T06:39:52.9884026Z inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_vec_ext.cc 2025-09-07T06:39:52.9884958Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-09-07T06:39:52.9885798Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-09-07T06:39:52.9886660Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-09-07T06:39:52.9994808Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-09-07T06:39:53.0037137Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-09-07T06:39:53.0037832Z creating: build/lib/ 2025-09-07T06:39:53.0127144Z inflating: build/lib/libprotobuf-lite.a 2025-09-07T06:39:53.0607519Z inflating: build/lib/libprotobuf.a 2025-09-07T06:39:53.1146454Z inflating: build/lib/libprotoc.a 2025-09-07T06:39:53.1156714Z inflating: build/lib/libpthreadpool.a 2025-09-07T06:39:53.1165388Z inflating: build/lib/libcpuinfo.a 2025-09-07T06:39:53.1173776Z inflating: build/lib/libcpuinfo_internals.a 2025-09-07T06:39:53.1174942Z inflating: build/lib/libclog.a 2025-09-07T06:39:53.1195196Z inflating: build/lib/libpytorch_qnnpack.a 2025-09-07T06:39:53.1197451Z inflating: build/lib/libnnpack_reference_layers.a 2025-09-07T06:39:53.1397859Z inflating: build/lib/libmicrokernels-prod.a 2025-09-07T06:39:53.1417156Z inflating: build/lib/libnnpack.a 2025-09-07T06:39:53.2359480Z inflating: build/lib/libmicrokernels-all.a 2025-09-07T06:39:53.2435015Z inflating: build/lib/libgtest.a 2025-09-07T06:39:53.2453411Z inflating: build/lib/libgmock.a 2025-09-07T06:39:53.2454152Z inflating: build/lib/libgtest_main.a 2025-09-07T06:39:53.2454754Z inflating: build/lib/libgmock_main.a 2025-09-07T06:39:53.2552474Z inflating: build/lib/libXNNPACK.a 2025-09-07T06:39:53.2633616Z inflating: build/lib/libbenchmark.a 2025-09-07T06:39:53.2634206Z inflating: build/lib/libbenchmark_main.a 2025-09-07T06:39:53.2704184Z inflating: build/lib/libasmjit.a 2025-09-07T06:39:53.2712647Z inflating: build/lib/libittnotify.a 2025-09-07T06:39:53.2713422Z inflating: build/lib/libjitprofiling.a 2025-09-07T06:39:53.4058393Z inflating: build/lib/libfbgemm.a 2025-09-07T06:39:53.4060971Z inflating: build/lib/libtensorpipe_uv.a 2025-09-07T06:39:53.4666758Z inflating: build/lib/libtensorpipe.a 2025-09-07T06:39:53.4797516Z inflating: build/lib/libgloo.a 2025-09-07T06:39:53.4850316Z inflating: build/lib/libonnx_proto.a 2025-09-07T06:39:53.5326268Z inflating: build/lib/libgloo_hip.a 2025-09-07T06:39:53.6109398Z inflating: build/lib/libonnx.a 2025-09-07T06:39:54.7348440Z inflating: build/lib/libdnnl.a 2025-09-07T06:39:54.7367740Z inflating: build/lib/libfmt.a 2025-09-07T06:39:54.7680430Z inflating: build/lib/libkineto.a 2025-09-07T06:39:54.7800171Z inflating: build/lib/libc10.so 2025-09-07T06:39:54.7801481Z inflating: build/lib/libtorch_global_deps.so 2025-09-07T06:39:54.7803232Z inflating: build/lib/libcaffe2_nvrtc.so 2025-09-07T06:39:54.7860430Z inflating: build/lib/libc10_hip.so 2025-09-07T06:39:54.8503290Z inflating: build/lib/libfbgemm_genai.a 2025-09-07T06:39:58.1001406Z inflating: build/lib/libtorch_cpu.so 2025-09-07T06:39:58.1004950Z inflating: build/lib/libshm.so 2025-09-07T06:39:59.0761720Z inflating: build/lib/libtorch_hip.so 2025-09-07T06:39:59.0762906Z inflating: build/lib/libtorch.so 2025-09-07T06:39:59.0782841Z inflating: build/lib/libjitbackend_test.so 2025-09-07T06:39:59.0859533Z inflating: build/lib/libtorchbind_test.so 2025-09-07T06:39:59.0885207Z inflating: build/lib/libbackend_with_compiler.so 2025-09-07T06:39:59.0913845Z inflating: build/lib/libaoti_custom_ops.so 2025-09-07T06:39:59.3214696Z inflating: build/lib/libtorch_python.so 2025-09-07T06:39:59.3252678Z inflating: build/lib/libnnapi_backend.so 2025-09-07T06:39:59.3253227Z creating: build/bin/ 2025-09-07T06:39:59.3253620Z creating: build/bin/CMakeFiles/ 2025-09-07T06:39:59.3254281Z inflating: build/bin/cmake_install.cmake 2025-09-07T06:39:59.3254802Z inflating: build/bin/CTestTestfile.cmake 2025-09-07T06:39:59.3747287Z inflating: build/bin/protoc-3.13.0.0 2025-09-07T06:39:59.4234713Z inflating: build/bin/protoc 2025-09-07T06:39:59.4297607Z inflating: build/bin/c10_AllocatorConfig_test 2025-09-07T06:39:59.4356481Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-09-07T06:39:59.4417175Z inflating: build/bin/c10_DeviceGuard_test 2025-09-07T06:39:59.4478649Z inflating: build/bin/c10_Device_test 2025-09-07T06:39:59.4547760Z inflating: build/bin/c10_DispatchKeySet_test 2025-09-07T06:39:59.4605735Z inflating: build/bin/c10_StreamGuard_test 2025-09-07T06:39:59.4669425Z inflating: build/bin/c10_Scalar_test 2025-09-07T06:39:59.4736429Z inflating: build/bin/c10_SymInt_test 2025-09-07T06:39:59.4800368Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-09-07T06:39:59.4865887Z inflating: build/bin/c10_InlineStreamGuard_test 2025-09-07T06:39:59.4928194Z inflating: build/bin/c10_Bitset_test 2025-09-07T06:39:59.4986932Z inflating: build/bin/c10_ArrayRef_test 2025-09-07T06:39:59.5053985Z inflating: build/bin/c10_SizesAndStrides_test 2025-09-07T06:39:59.5135139Z inflating: build/bin/c10_cow_test 2025-09-07T06:39:59.5193182Z inflating: build/bin/c10_ConstexprCrc_test 2025-09-07T06:39:59.5260106Z inflating: build/bin/c10_Enumerate_test 2025-09-07T06:39:59.5318975Z inflating: build/bin/c10_DeadlockDetection_test 2025-09-07T06:39:59.5381017Z inflating: build/bin/c10_IntrusiveList_test 2025-09-07T06:39:59.5440565Z inflating: build/bin/c10_Half_test 2025-09-07T06:39:59.5507425Z inflating: build/bin/c10_LeftRight_test 2025-09-07T06:39:59.5572532Z inflating: build/bin/c10_Metaprogramming_test 2025-09-07T06:39:59.5635304Z inflating: build/bin/c10_NetworkFlow_test 2025-09-07T06:39:59.5694462Z inflating: build/bin/c10_Synchronized_test 2025-09-07T06:39:59.5753227Z inflating: build/bin/c10_Semaphore_test 2025-09-07T06:39:59.5819264Z inflating: build/bin/c10_ThreadLocal_test 2025-09-07T06:39:59.5880393Z inflating: build/bin/c10_TypeIndex_test 2025-09-07T06:39:59.5940817Z inflating: build/bin/c10_TypeList_test 2025-09-07T06:39:59.6001637Z inflating: build/bin/c10_accumulate_test 2025-09-07T06:39:59.6059725Z inflating: build/bin/c10_TypeTraits_test 2025-09-07T06:39:59.6125308Z inflating: build/bin/c10_bfloat16_test 2025-09-07T06:39:59.6192156Z inflating: build/bin/c10_complex_math_test 2025-09-07T06:39:59.6251522Z inflating: build/bin/c10_bit_cast_test 2025-09-07T06:39:59.6316470Z inflating: build/bin/c10_complex_test 2025-09-07T06:39:59.6378386Z inflating: build/bin/c10_exception_test 2025-09-07T06:39:59.6437303Z inflating: build/bin/c10_error_test 2025-09-07T06:39:59.6496877Z inflating: build/bin/c10_flags_test 2025-09-07T06:39:59.6556291Z inflating: build/bin/c10_generic_math_test 2025-09-07T06:39:59.6616342Z inflating: build/bin/c10_irange_test 2025-09-07T06:39:59.6804792Z inflating: build/bin/c10_intrusive_ptr_test 2025-09-07T06:39:59.6867548Z inflating: build/bin/c10_lazy_test 2025-09-07T06:39:59.6935176Z inflating: build/bin/c10_logging_test 2025-09-07T06:39:59.7022651Z inflating: build/bin/c10_optional_test 2025-09-07T06:39:59.7094904Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-09-07T06:39:59.7157393Z inflating: build/bin/c10_registry_test 2025-09-07T06:39:59.7334997Z inflating: build/bin/c10_small_vector_test 2025-09-07T06:39:59.7395787Z inflating: build/bin/c10_ssize_test 2025-09-07T06:39:59.7462252Z inflating: build/bin/c10_string_util_test 2025-09-07T06:39:59.7520189Z inflating: build/bin/c10_string_view_test 2025-09-07T06:39:59.7579470Z inflating: build/bin/c10_tempfile_test 2025-09-07T06:39:59.7631031Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-09-07T06:39:59.7696827Z inflating: build/bin/c10_typeid_test 2025-09-07T06:39:59.7754847Z inflating: build/bin/c10_hip_HIPAssertionsTest_1_var_test 2025-09-07T06:39:59.7812788Z inflating: build/bin/c10_hip_HIPAssertionsTest_catches_stream 2025-09-07T06:39:59.7871105Z inflating: build/bin/c10_hip_HIPAssertionsTest_catches_thread_and_block_and_device 2025-09-07T06:39:59.7928722Z inflating: build/bin/c10_hip_HIPAssertionsTest_from_2_processes 2025-09-07T06:39:59.7986866Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_blocks_and_threads 2025-09-07T06:39:59.8044714Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_multiple_blocks 2025-09-07T06:39:59.8102561Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_same_block 2025-09-07T06:39:59.8160774Z inflating: build/bin/c10_hip_HIPTest 2025-09-07T06:39:59.8822658Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-09-07T06:39:59.9502756Z inflating: build/bin/vec_test_all_types_AVX512 2025-09-07T06:40:00.0188914Z inflating: build/bin/vec_test_all_types_AVX2 2025-09-07T06:40:00.0251299Z inflating: build/bin/BackoffTest 2025-09-07T06:40:00.0314688Z inflating: build/bin/FileStoreTest 2025-09-07T06:40:00.0381544Z inflating: build/bin/TCPStoreTest 2025-09-07T06:40:00.0445555Z inflating: build/bin/HashStoreTest 2025-09-07T06:40:00.0523363Z inflating: build/bin/ProcessGroupGlooTest 2025-09-07T06:40:00.0526465Z inflating: build/bin/example_allreduce 2025-09-07T06:40:00.0530713Z inflating: build/bin/torch_shm_manager 2025-09-07T06:40:00.0593507Z inflating: build/bin/static_runtime_bench 2025-09-07T06:40:00.0882959Z inflating: build/bin/static_runtime_test 2025-09-07T06:40:00.0969605Z inflating: build/bin/Dict_test 2025-09-07T06:40:00.1031739Z inflating: build/bin/Dimname_test 2025-09-07T06:40:00.1109197Z inflating: build/bin/MaybeOwned_test 2025-09-07T06:40:00.1177276Z inflating: build/bin/NamedTensor_test 2025-09-07T06:40:00.1246037Z inflating: build/bin/apply_utils_test 2025-09-07T06:40:00.1315123Z inflating: build/bin/atest 2025-09-07T06:40:00.1389695Z inflating: build/bin/basic 2025-09-07T06:40:00.1454642Z inflating: build/bin/broadcast_test 2025-09-07T06:40:00.1514659Z inflating: build/bin/cpu_allocator_test 2025-09-07T06:40:00.1582877Z inflating: build/bin/cpu_generator_test 2025-09-07T06:40:00.1645170Z inflating: build/bin/cpu_profiling_allocator_test 2025-09-07T06:40:00.1751529Z inflating: build/bin/cpu_rng_test 2025-09-07T06:40:00.1811485Z inflating: build/bin/dlconvertor_test 2025-09-07T06:40:00.1879222Z inflating: build/bin/extension_backend_test 2025-09-07T06:40:00.1944940Z inflating: build/bin/half_test 2025-09-07T06:40:00.2055202Z inflating: build/bin/ivalue_test 2025-09-07T06:40:00.2113989Z inflating: build/bin/lazy_tensor_test 2025-09-07T06:40:00.2178659Z inflating: build/bin/math_kernel_test 2025-09-07T06:40:00.2241654Z inflating: build/bin/memory_format_test 2025-09-07T06:40:00.2304765Z inflating: build/bin/memory_overlapping_test 2025-09-07T06:40:00.2365494Z inflating: build/bin/operator_name_test 2025-09-07T06:40:00.2431599Z inflating: build/bin/native_test 2025-09-07T06:40:00.2494349Z inflating: build/bin/mobile_memory_cleanup 2025-09-07T06:40:00.2555528Z inflating: build/bin/operators_test 2025-09-07T06:40:00.2617439Z inflating: build/bin/packedtensoraccessor_test 2025-09-07T06:40:00.2695700Z inflating: build/bin/pow_test 2025-09-07T06:40:00.2762937Z inflating: build/bin/quantized_test 2025-09-07T06:40:00.2823350Z inflating: build/bin/reportMemoryUsage_test 2025-09-07T06:40:00.2882498Z inflating: build/bin/reduce_ops_test 2025-09-07T06:40:00.2949097Z inflating: build/bin/scalar_tensor_test 2025-09-07T06:40:00.3009669Z inflating: build/bin/StorageUtils_test 2025-09-07T06:40:00.3079578Z inflating: build/bin/scalar_test 2025-09-07T06:40:00.3141530Z inflating: build/bin/stride_properties_test 2025-09-07T06:40:00.3206026Z inflating: build/bin/type_ptr_test 2025-09-07T06:40:00.3299328Z inflating: build/bin/tensor_iterator_test 2025-09-07T06:40:00.3359778Z inflating: build/bin/thread_init_test 2025-09-07T06:40:00.3424556Z inflating: build/bin/test_parallel 2025-09-07T06:40:00.3495149Z inflating: build/bin/type_test 2025-09-07T06:40:00.3558315Z inflating: build/bin/undefined_tensor_test 2025-09-07T06:40:00.3620385Z inflating: build/bin/verify_api_visibility 2025-09-07T06:40:00.3702309Z inflating: build/bin/legacy_vmap_test 2025-09-07T06:40:00.3763148Z inflating: build/bin/weakref_test 2025-09-07T06:40:00.3823938Z inflating: build/bin/wrapdim_test 2025-09-07T06:40:00.3894244Z inflating: build/bin/IListRef_test 2025-09-07T06:40:00.3956081Z inflating: build/bin/xla_tensor_test 2025-09-07T06:40:00.4081147Z inflating: build/bin/List_test 2025-09-07T06:40:00.4219362Z inflating: build/bin/kernel_function_legacy_test 2025-09-07T06:40:00.4297028Z inflating: build/bin/KernelFunction_test 2025-09-07T06:40:00.4406967Z inflating: build/bin/kernel_function_test 2025-09-07T06:40:00.4550943Z inflating: build/bin/kernel_lambda_legacy_test 2025-09-07T06:40:00.4668320Z inflating: build/bin/kernel_lambda_test 2025-09-07T06:40:00.4739102Z inflating: build/bin/kernel_stackbased_test 2025-09-07T06:40:00.4799498Z inflating: build/bin/CppSignature_test 2025-09-07T06:40:00.4908576Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-09-07T06:40:00.4969086Z inflating: build/bin/op_allowlist_test 2025-09-07T06:40:00.5034257Z inflating: build/bin/backend_fallback_test 2025-09-07T06:40:00.5387186Z inflating: build/bin/op_registration_test 2025-09-07T06:40:00.5445362Z inflating: build/bin/hip_complex_math_test 2025-09-07T06:40:00.5523484Z inflating: build/bin/inline_container_test 2025-09-07T06:40:00.5582031Z inflating: build/bin/hip_complex_test 2025-09-07T06:40:00.5644420Z inflating: build/bin/hip_apply_test 2025-09-07T06:40:00.5702479Z inflating: build/bin/hip_distributions_test 2025-09-07T06:40:00.5760415Z inflating: build/bin/hip_generator_test 2025-09-07T06:40:00.5819066Z inflating: build/bin/hip_half_test 2025-09-07T06:40:00.5877122Z inflating: build/bin/hip_integer_divider_test 2025-09-07T06:40:00.5935079Z inflating: build/bin/hip_optional_test 2025-09-07T06:40:00.5993077Z inflating: build/bin/hip_packedtensoraccessor_test 2025-09-07T06:40:00.6050948Z inflating: build/bin/hip_vectorized_test 2025-09-07T06:40:00.6111981Z inflating: build/bin/hip_dlconvertor_test 2025-09-07T06:40:00.7324132Z inflating: build/bin/test_jit 2025-09-07T06:40:00.7738648Z inflating: build/bin/test_nativert 2025-09-07T06:40:00.7803282Z inflating: build/bin/test_dist_autograd 2025-09-07T06:40:00.7882006Z inflating: build/bin/test_cpp_rpc 2025-09-07T06:40:00.9157926Z inflating: build/bin/test_api 2025-09-07T06:40:00.9160483Z inflating: build/bin/parallel_benchmark 2025-09-07T06:40:00.9550144Z inflating: build/bin/test_lazy 2025-09-07T06:40:00.9550707Z creating: .additional_ci_files/ 2025-09-07T06:40:00.9647206Z inflating: .additional_ci_files/test-times.json 2025-09-07T06:40:01.0013014Z inflating: .additional_ci_files/test-class-times.json 2025-09-07T06:40:01.0051150Z ##[group]Run rm artifacts.zip 2025-09-07T06:40:01.0051430Z rm artifacts.zip 2025-09-07T06:40:01.0090593Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:01.0090935Z env: 2025-09-07T06:40:01.0091138Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:01.0091511Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:01.0092377Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:01.0092889Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:01.0094055Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:01.0094832Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:01.0095081Z AWS_REGION: us-east-1 2025-09-07T06:40:01.0095393Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:01.0095715Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:01.0101023Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:01.0101239Z ##[endgroup] 2025-09-07T06:40:01.2441866Z ##[group]Run df -H 2025-09-07T06:40:01.2442097Z df -H 2025-09-07T06:40:01.2481068Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:01.2481406Z env: 2025-09-07T06:40:01.2481609Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:01.2482010Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:01.2482580Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:01.2483116Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:01.2483944Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:01.2484692Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:01.2484949Z AWS_REGION: us-east-1 2025-09-07T06:40:01.2485256Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:01.2485595Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:01.2490637Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:01.2490868Z ##[endgroup] 2025-09-07T06:40:01.2583302Z Filesystem Size Used Avail Use% Mounted on 2025-09-07T06:40:01.2583771Z tmpfs 109G 33M 109G 1% /run 2025-09-07T06:40:01.2584211Z /dev/nvme0n1p2 944G 72G 825G 8% / 2025-09-07T06:40:01.2584654Z tmpfs 542G 33k 542G 1% /dev/shm 2025-09-07T06:40:01.2585080Z tmpfs 5.3M 0 5.3M 0% /run/lock 2025-09-07T06:40:01.2585531Z /dev/nvme0n1p1 536M 6.4M 530M 2% /boot/efi 2025-09-07T06:40:01.2585997Z /dev/nvme1n1p1 3.8T 1.2T 2.5T 33% /media/4TB 2025-09-07T06:40:01.2586465Z tmpfs 109G 33k 109G 1% /run/user/1307800118 2025-09-07T06:40:01.2586951Z 172.18.148.8:/export/amd2 5.5T 278G 5.3T 6% /mnt 2025-09-07T06:40:01.2587933Z pure1.jax.cs.cpe.ice.amd.com:/homes/amd-pytorch 108G 403M 107G 1% /home/amd-pytorch 2025-09-07T06:40:01.2588555Z 172.18.148.15:/GroupStorage 165T 133T 33T 81% /groups 2025-09-07T06:40:01.2589097Z 172.18.148.15:/GroupStorage/Scratch 5.5T 1.8T 3.8T 33% /scratch 2025-09-07T06:40:01.2618767Z Prepare all required actions 2025-09-07T06:40:01.2619158Z Getting action download info 2025-09-07T06:40:01.4239749Z ##[group]Run ./.github/actions/download-td-artifacts 2025-09-07T06:40:01.4240059Z with: 2025-09-07T06:40:01.4240240Z env: 2025-09-07T06:40:01.4240423Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:01.4240800Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:01.4241342Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:01.4241840Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:01.4242685Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:01.4243721Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:01.4243979Z AWS_REGION: us-east-1 2025-09-07T06:40:01.4244292Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:01.4244615Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:01.4249595Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:01.4249827Z ##[endgroup] 2025-09-07T06:40:01.4276899Z ##[group]Run seemethere/download-artifact-s3@v4 2025-09-07T06:40:01.4277209Z with: 2025-09-07T06:40:01.4277398Z name: td_results 2025-09-07T06:40:01.4277612Z s3-bucket: gha-artifacts 2025-09-07T06:40:01.4277842Z region: us-east-1 2025-09-07T06:40:01.4278035Z env: 2025-09-07T06:40:01.4278225Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:01.4278598Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:01.4279154Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:01.4279652Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:01.4280502Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:01.4281244Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:01.4281494Z AWS_REGION: us-east-1 2025-09-07T06:40:01.4281770Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:01.4282086Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:01.4287067Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:01.4287296Z ##[endgroup] 2025-09-07T06:40:01.9153190Z (node:748845) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-09-07T06:40:01.9153916Z 2025-09-07T06:40:01.9154212Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-09-07T06:40:01.9155099Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-09-07T06:40:01.9156097Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-09-07T06:40:02.0730493Z Found 1 objects with prefix pytorch/pytorch/17524754569/td_results/ 2025-09-07T06:40:02.0731540Z Starting download (1/1): /var/home/pytorchci/actions-runner/_work/pytorch/pytorch/td_results.json 2025-09-07T06:40:02.2801251Z Finished download (1/1): /var/home/pytorchci/actions-runner/_work/pytorch/pytorch/td_results.json 2025-09-07T06:40:02.2806710Z Artifact download has finished successfully 2025-09-07T06:40:02.3220733Z ##[group]Run mkdir -p .additional_ci_files 2025-09-07T06:40:02.3221071Z mkdir -p .additional_ci_files 2025-09-07T06:40:02.3221453Z mv td_results.json .additional_ci_files/td_results.json || true 2025-09-07T06:40:02.3260011Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:02.3260350Z env: 2025-09-07T06:40:02.3260559Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:02.3260939Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:02.3261517Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:02.3262077Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:02.3263242Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:02.3264027Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:02.3264285Z AWS_REGION: us-east-1 2025-09-07T06:40:02.3264590Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:02.3264927Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:02.3270005Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:02.3270237Z ##[endgroup] 2025-09-07T06:40:02.3425353Z ##[group]Run .github/scripts/parse_ref.py 2025-09-07T06:40:02.3425711Z .github/scripts/parse_ref.py 2025-09-07T06:40:02.3462310Z shell: /usr/bin/bash -e {0} 2025-09-07T06:40:02.3462565Z env: 2025-09-07T06:40:02.3462775Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:02.3463169Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:02.3464025Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:02.3464560Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:02.3465412Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:02.3466161Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:02.3466424Z AWS_REGION: us-east-1 2025-09-07T06:40:02.3466725Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:02.3467113Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:02.3472105Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:02.3472343Z ##[endgroup] 2025-09-07T06:40:02.3714412Z Setting output branch=main 2025-09-07T06:40:02.3832697Z Prepare all required actions 2025-09-07T06:40:02.3833115Z Getting action download info 2025-09-07T06:40:02.5547553Z ##[group]Run ./.github/actions/filter-test-configs 2025-09-07T06:40:02.5547887Z with: 2025-09-07T06:40:02.5548315Z github-token: *** 2025-09-07T06:40:02.5549008Z test-matrix: {"include": [{"config": "slow", "shard": 1, "num_shards": 2, "runner": "linux.rocm.gpu.2", "owners": ["module:rocm"]}, {"config": "slow", "shard": 2, "num_shards": 2, "runner": "linux.rocm.gpu.2", "owners": ["module:rocm"]}]} 2025-09-07T06:40:02.5549878Z job-name: linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm) 2025-09-07T06:40:02.5550285Z env: 2025-09-07T06:40:02.5550474Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:02.5550839Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:02.5551372Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:02.5551909Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:02.5552743Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:02.5553513Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:02.5553751Z AWS_REGION: us-east-1 2025-09-07T06:40:02.5554013Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:02.5554329Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:02.5559314Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:02.5559529Z ##[endgroup] 2025-09-07T06:40:02.5592642Z ##[group]Run nick-fields/retry@v3.0.0 2025-09-07T06:40:02.5592942Z with: 2025-09-07T06:40:02.5593130Z shell: bash 2025-09-07T06:40:02.5593345Z timeout_minutes: 10 2025-09-07T06:40:02.5593577Z max_attempts: 5 2025-09-07T06:40:02.5593801Z retry_wait_seconds: 30 2025-09-07T06:40:02.5594496Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-09-07T06:40:02.5595223Z polling_interval_seconds: 1 2025-09-07T06:40:02.5595475Z warning_on_retry: true 2025-09-07T06:40:02.5595723Z continue_on_error: false 2025-09-07T06:40:02.5595961Z env: 2025-09-07T06:40:02.5596170Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:02.5596557Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:02.5597131Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:02.5597652Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:02.5598509Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:02.5599282Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:02.5599552Z AWS_REGION: us-east-1 2025-09-07T06:40:02.5599839Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:02.5600183Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:02.5605463Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:02.5605809Z GITHUB_TOKEN: *** 2025-09-07T06:40:02.5606031Z ##[endgroup] 2025-09-07T06:40:02.6422789Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-09-07T06:40:02.9401640Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T06:40:03.0520500Z Requirement already satisfied: requests==2.27.1 in /var/home/pytorchci/.local/lib/python3.10/site-packages (2.27.1) 2025-09-07T06:40:03.0525709Z Requirement already satisfied: pyyaml==6.0.2 in /var/home/pytorchci/.local/lib/python3.10/site-packages (6.0.2) 2025-09-07T06:40:03.0629383Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3/dist-packages (from requests==2.27.1) (1.26.5) 2025-09-07T06:40:03.0639355Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests==2.27.1) (3.3) 2025-09-07T06:40:03.0650663Z Requirement already satisfied: charset-normalizer~=2.0.0 in /var/home/pytorchci/.local/lib/python3.10/site-packages (from requests==2.27.1) (2.0.12) 2025-09-07T06:40:03.0656152Z Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests==2.27.1) (2020.6.20) 2025-09-07T06:40:03.6412237Z Command completed after 1 attempt(s). 2025-09-07T06:40:03.6485755Z ##[group]Run set -x 2025-09-07T06:40:03.6486011Z set -x 2025-09-07T06:40:03.6486209Z  2025-09-07T06:40:03.6486556Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-09-07T06:40:03.6486993Z # in runner workspace 2025-09-07T06:40:03.6487361Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-09-07T06:40:03.6526429Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:03.6526774Z env: 2025-09-07T06:40:03.6526975Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:03.6527407Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:03.6528006Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:03.6528552Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:03.6529400Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:03.6530153Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:03.6530413Z AWS_REGION: us-east-1 2025-09-07T06:40:03.6530722Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:03.6531060Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:03.6536132Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:03.6536364Z ##[endgroup] 2025-09-07T06:40:03.6596676Z + python3 /var/home/pytorchci/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-09-07T06:40:03.6774099Z Setting output branch=main 2025-09-07T06:40:03.6832110Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-09-07T06:40:03.6832513Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-09-07T06:40:03.6832808Z echo "Job name: ${JOB_NAME}" 2025-09-07T06:40:03.6833073Z  2025-09-07T06:40:03.6833412Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-09-07T06:40:03.6833837Z # in runner workspace 2025-09-07T06:40:03.6834209Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-09-07T06:40:03.6834626Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-09-07T06:40:03.6834929Z  --job-name "${JOB_NAME}" \ 2025-09-07T06:40:03.6835715Z  --test-matrix "{"include": [{"config": "slow", "shard": 1, "num_shards": 2, "runner": "linux.rocm.gpu.2", "owners": ["module:rocm"]}, {"config": "slow", "shard": 2, "num_shards": 2, "runner": "linux.rocm.gpu.2", "owners": ["module:rocm"]}]}" \ 2025-09-07T06:40:03.6836475Z  --selected-test-configs "" \ 2025-09-07T06:40:03.6836767Z  --pr-number "${PR_NUMBER}" \ 2025-09-07T06:40:03.6837280Z  --tag "${TAG}" \ 2025-09-07T06:40:03.6837536Z  --event-name "${EVENT_NAME}" \ 2025-09-07T06:40:03.6837812Z  --schedule "${SCHEDULE}" \ 2025-09-07T06:40:03.6838081Z  --branch "${HEAD_BRANCH}" 2025-09-07T06:40:03.6873269Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:03.6873615Z env: 2025-09-07T06:40:03.6873827Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:03.6874221Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:03.6874778Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:03.6875283Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:03.6876387Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:03.6877189Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:03.6877447Z AWS_REGION: us-east-1 2025-09-07T06:40:03.6877733Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:03.6878060Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:03.6883054Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:03.6883405Z GITHUB_TOKEN: *** 2025-09-07T06:40:03.6883776Z JOB_NAME: linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm) 2025-09-07T06:40:03.6884184Z PR_NUMBER: 2025-09-07T06:40:03.6884377Z TAG: 2025-09-07T06:40:03.6884559Z EVENT_NAME: push 2025-09-07T06:40:03.6884758Z SCHEDULE: 2025-09-07T06:40:03.6884941Z HEAD_BRANCH: main 2025-09-07T06:40:03.6885140Z ##[endgroup] 2025-09-07T06:40:03.6946265Z Workflow: slow 2025-09-07T06:40:03.6946663Z Job name: linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm) 2025-09-07T06:40:04.2760086Z Setting output keep-going=True 2025-09-07T06:40:04.2760626Z Setting output ci-verbose-test-logs=False 2025-09-07T06:40:04.2761178Z Setting output ci-test-showlocals=False 2025-09-07T06:40:04.2761691Z Setting output ci-no-test-timeout=False 2025-09-07T06:40:04.2762191Z Setting output ci-no-td=False 2025-09-07T06:40:04.2762653Z Setting output ci-td-distributed=False 2025-09-07T06:40:04.2763129Z Setting output is-unstable=False 2025-09-07T06:40:04.2763571Z Setting output reenabled-issues= 2025-09-07T06:40:04.2764900Z Setting output test-matrix={"include": [{"config": "slow", "shard": 1, "num_shards": 2, "runner": "linux.rocm.gpu.2", "owners": ["module:rocm"]}, {"config": "slow", "shard": 2, "num_shards": 2, "runner": "linux.rocm.gpu.2", "owners": ["module:rocm"]}]} 2025-09-07T06:40:04.2766276Z Setting output is-test-matrix-empty=False 2025-09-07T06:40:04.2928609Z ##[group]Run echo "Filtered matrix:" 2025-09-07T06:40:04.2928916Z echo "Filtered matrix:" 2025-09-07T06:40:04.2929652Z echo "{"include": [{"config": "slow", "shard": 1, "num_shards": 2, "runner": "linux.rocm.gpu.2", "owners": ["module:rocm"]}, {"config": "slow", "shard": 2, "num_shards": 2, "runner": "linux.rocm.gpu.2", "owners": ["module:rocm"]}]}" 2025-09-07T06:40:04.2930388Z  2025-09-07T06:40:04.2930575Z echo 2025-09-07T06:40:04.2930816Z echo "Is the current job unstable? False" 2025-09-07T06:40:04.2931105Z  2025-09-07T06:40:04.2931275Z echo 2025-09-07T06:40:04.2931495Z echo "Is keep-going label set? True" 2025-09-07T06:40:04.2931766Z  2025-09-07T06:40:04.2931946Z echo 2025-09-07T06:40:04.2932161Z echo "Reenabled issues? " 2025-09-07T06:40:04.2970313Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:04.2970664Z env: 2025-09-07T06:40:04.2970872Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:04.2971267Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:04.2971853Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:04.2972636Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:04.2973467Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:04.2974286Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:04.2974546Z AWS_REGION: us-east-1 2025-09-07T06:40:04.2974844Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:04.2975169Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:04.2980168Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:04.2980408Z ##[endgroup] 2025-09-07T06:40:04.3034422Z Filtered matrix: 2025-09-07T06:40:04.3035129Z {include: [{config: slow, shard: 1, num_shards: 2, runner: linux.rocm.gpu.2, owners: [module:rocm]}, {config: slow, shard: 2, num_shards: 2, runner: linux.rocm.gpu.2, owners: [module:rocm]}]} 2025-09-07T06:40:04.3035967Z 2025-09-07T06:40:04.3036089Z Is the current job unstable? False 2025-09-07T06:40:04.3036272Z 2025-09-07T06:40:04.3036399Z Is keep-going label set? True 2025-09-07T06:40:04.3036843Z 2025-09-07T06:40:04.3037002Z Reenabled issues? 2025-09-07T06:40:04.3102606Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:04.3103088Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:04.3141290Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:04.3141640Z env: 2025-09-07T06:40:04.3141853Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:04.3142226Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:04.3142778Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:04.3143317Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:04.3144162Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:04.3144942Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:04.3145230Z AWS_REGION: us-east-1 2025-09-07T06:40:04.3145528Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:04.3145850Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:04.3150813Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:04.3151033Z JOB_TIMEOUT: 300 2025-09-07T06:40:04.3151228Z ##[endgroup] 2025-09-07T06:40:04.3262869Z ##[group]Run set -x 2025-09-07T06:40:04.3263176Z set -x 2025-09-07T06:40:04.3263389Z  2025-09-07T06:40:04.3263614Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-09-07T06:40:04.3263977Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-09-07T06:40:04.3264345Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-09-07T06:40:04.3264671Z  TEST_COMMAND=.ci/caffe2/test.sh 2025-09-07T06:40:04.3264947Z else 2025-09-07T06:40:04.3265178Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-09-07T06:40:04.3265484Z fi 2025-09-07T06:40:04.3265676Z  2025-09-07T06:40:04.3265981Z # detached container should get cleaned up by teardown_ec2_linux 2025-09-07T06:40:04.3266443Z # TODO: Stop building test binaries as part of the build phase 2025-09-07T06:40:04.3266846Z # Used for GPU_FLAG since that doesn't play nice 2025-09-07T06:40:04.3267228Z # shellcheck disable=SC2086,SC2090 2025-09-07T06:40:04.3267530Z container_name=$(docker run \ 2025-09-07T06:40:04.3267807Z  ${GPU_FLAG:-} \ 2025-09-07T06:40:04.3268053Z  -e BUILD_ENVIRONMENT \ 2025-09-07T06:40:04.3268315Z  -e PR_NUMBER \ 2025-09-07T06:40:04.3268557Z  -e GITHUB_ACTIONS \ 2025-09-07T06:40:04.3268822Z  -e GITHUB_REPOSITORY \ 2025-09-07T06:40:04.3269105Z  -e GITHUB_WORKFLOW \ 2025-09-07T06:40:04.3269368Z  -e GITHUB_JOB \ 2025-09-07T06:40:04.3269600Z  -e GITHUB_RUN_ID \ 2025-09-07T06:40:04.3269841Z  -e GITHUB_RUN_NUMBER \ 2025-09-07T06:40:04.3270359Z  -e GITHUB_RUN_ATTEMPT \ 2025-09-07T06:40:04.3270611Z  -e JOB_ID \ 2025-09-07T06:40:04.3270833Z  -e JOB_NAME \ 2025-09-07T06:40:04.3271058Z  -e BRANCH \ 2025-09-07T06:40:04.3271271Z  -e SHA1 \ 2025-09-07T06:40:04.3271490Z  -e AWS_DEFAULT_REGION \ 2025-09-07T06:40:04.3271753Z  -e IN_WHEEL_TEST \ 2025-09-07T06:40:04.3271991Z  -e SHARD_NUMBER \ 2025-09-07T06:40:04.3272229Z  -e TEST_CONFIG \ 2025-09-07T06:40:04.3272474Z  -e NUM_TEST_SHARDS \ 2025-09-07T06:40:04.3272731Z  -e REENABLED_ISSUES \ 2025-09-07T06:40:04.3272988Z  -e CONTINUE_THROUGH_ERROR \ 2025-09-07T06:40:04.3273256Z  -e VERBOSE_TEST_LOGS \ 2025-09-07T06:40:04.3273511Z  -e TEST_SHOWLOCALS \ 2025-09-07T06:40:04.3273760Z  -e NO_TEST_TIMEOUT \ 2025-09-07T06:40:04.3274001Z  -e NO_TD \ 2025-09-07T06:40:04.3274259Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-09-07T06:40:04.3274582Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-09-07T06:40:04.3274906Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-09-07T06:40:04.3275202Z  -e TESTS_TO_INCLUDE \ 2025-09-07T06:40:04.3275454Z  -e DASHBOARD_TAG \ 2025-09-07T06:40:04.3275779Z  --env-file="${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" \ 2025-09-07T06:40:04.3276150Z  --ulimit stack=10485760:83886080 \ 2025-09-07T06:40:04.3276442Z  --ulimit core=0 \ 2025-09-07T06:40:04.3276709Z  --security-opt seccomp=unconfined \ 2025-09-07T06:40:04.3277011Z  --cap-add=SYS_PTRACE \ 2025-09-07T06:40:04.3277272Z  --shm-size="8g" \ 2025-09-07T06:40:04.3277507Z  --tty \ 2025-09-07T06:40:04.3277721Z  --detach \ 2025-09-07T06:40:04.3277966Z  --name="${container_name}" \ 2025-09-07T06:40:04.3278246Z  --user jenkins \ 2025-09-07T06:40:04.3278563Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-09-07T06:40:04.3278913Z  -w /var/lib/jenkins/workspace \ 2025-09-07T06:40:04.3279194Z  "${DOCKER_IMAGE}" 2025-09-07T06:40:04.3279419Z ) 2025-09-07T06:40:04.3279635Z # save container name for later step 2025-09-07T06:40:04.3280183Z echo "CONTAINER_NAME=${container_name}" >> "$GITHUB_ENV" 2025-09-07T06:40:04.3280819Z # jenkins user does not have write permission to mounted workspace; work-around by copying within container to jenkins home 2025-09-07T06:40:04.3281617Z docker exec -t "${container_name}" sh -c "cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && ${TEST_COMMAND}" 2025-09-07T06:40:04.3317773Z shell: /usr/bin/bash -e {0} 2025-09-07T06:40:04.3318042Z env: 2025-09-07T06:40:04.3318245Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:04.3318653Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T06:40:04.3319253Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T06:40:04.3319787Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T06:40:04.3320649Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:04.3321408Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:04.3321670Z AWS_REGION: us-east-1 2025-09-07T06:40:04.3321971Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:04.3322313Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:04.3327314Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:04.3327577Z BUILD_ENVIRONMENT: linux-jammy-rocm-py3.10 2025-09-07T06:40:04.3327872Z PR_NUMBER: 2025-09-07T06:40:04.3328089Z GITHUB_REPOSITORY: pytorch/pytorch 2025-09-07T06:40:04.3328359Z GITHUB_WORKFLOW: slow 2025-09-07T06:40:04.3328579Z GITHUB_JOB: test 2025-09-07T06:40:04.3329025Z GITHUB_RUN_ID: 17524754569 2025-09-07T06:40:04.3329271Z GITHUB_RUN_NUMBER: 17294 2025-09-07T06:40:04.3329514Z GITHUB_RUN_ATTEMPT: 1 2025-09-07T06:40:04.3329738Z JOB_ID: 49774352868 2025-09-07T06:40:04.3330099Z JOB_NAME: linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm) 2025-09-07T06:40:04.3330504Z BRANCH: main 2025-09-07T06:40:04.3330731Z SHA1: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:40:04.3331033Z CONTINUE_THROUGH_ERROR: True 2025-09-07T06:40:04.3331286Z VERBOSE_TEST_LOGS: False 2025-09-07T06:40:04.3331525Z TEST_SHOWLOCALS: False 2025-09-07T06:40:04.3331756Z NO_TEST_TIMEOUT: False 2025-09-07T06:40:04.3331976Z NO_TD: False 2025-09-07T06:40:04.3332175Z TEST_CONFIG: slow 2025-09-07T06:40:04.3332383Z SHARD_NUMBER: 1 2025-09-07T06:40:04.3332585Z NUM_TEST_SHARDS: 2 2025-09-07T06:40:04.3332801Z REENABLED_ISSUES: 2025-09-07T06:40:04.3333408Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:04.3334185Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 0 2025-09-07T06:40:04.3334457Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-09-07T06:40:04.3334733Z TESTS_TO_INCLUDE: 2025-09-07T06:40:04.3334945Z DASHBOARD_TAG: 2025-09-07T06:40:04.3335156Z ##[endgroup] 2025-09-07T06:40:04.3395996Z + [[ slow == \m\u\l\t\i\g\p\u ]] 2025-09-07T06:40:04.3396350Z + [[ linux-jammy-rocm-py3.10 == *onnx* ]] 2025-09-07T06:40:04.3396712Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-09-07T06:40:04.3410126Z +++ nproc --ignore=2 2025-09-07T06:40:04.3429939Z ++ docker run --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e MAX_JOBS=126 -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e TESTS_TO_INCLUDE -e DASHBOARD_TAG --env-file=/var/home/pytorchci/actions-runner/_work/_temp/github_env_17524754569 --ulimit stack=10485760:83886080 --ulimit core=0 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --shm-size=8g --tty --detach --name= --user jenkins -v /var/home/pytorchci/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:04.4855908Z + container_name=9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T06:40:04.4856946Z + echo CONTAINER_NAME=9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T06:40:04.4858492Z + docker exec -t 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f sh -c 'cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && .ci/pytorch/test.sh' 2025-09-07T06:40:21.6369344Z Processing ./dist/torch-2.9.0a0+git93fb23d-cp310-cp310-linux_x86_64.whl 2025-09-07T06:40:22.1004886Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d) (3.19.1) 2025-09-07T06:40:22.1009601Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d) (4.15.0) 2025-09-07T06:40:22.1013401Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d) (1.13.3) 2025-09-07T06:40:22.1017691Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d) (2.8.8) 2025-09-07T06:40:22.1020677Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d) (3.1.6) 2025-09-07T06:40:22.1024815Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d) (2025.7.0) 2025-09-07T06:40:22.1396648Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.9.0a0+git93fb23d) (1.3.0) 2025-09-07T06:40:22.1429446Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.9.0a0+git93fb23d) (3.0.2) 2025-09-07T06:40:22.5422686Z Installing collected packages: torch 2025-09-07T06:40:33.4456712Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-09-07T06:40:33.4458264Z helion 0.1.3 requires filecheck, which is not installed. 2025-09-07T06:40:33.4459031Z Successfully installed torch-2.9.0a0+git93fb23d 2025-09-07T06:40:33.5016321Z + export TERM=vt100 2025-09-07T06:40:33.5016714Z + TERM=vt100 2025-09-07T06:40:33.5022805Z ++ dirname .ci/pytorch/test.sh 2025-09-07T06:40:33.5037487Z + source .ci/pytorch/common.sh 2025-09-07T06:40:33.5044569Z +++ dirname .ci/pytorch/common.sh 2025-09-07T06:40:33.5059034Z ++ source .ci/pytorch/common_utils.sh 2025-09-07T06:40:33.5060624Z +++ declare -f -t trap_add 2025-09-07T06:40:33.5064971Z ++ set -ex -o pipefail 2025-09-07T06:40:33.5065252Z ++ [[ linux-jammy-rocm-py3.10 == *rocm* ]] 2025-09-07T06:40:33.5065528Z ++ unset HIP_PLATFORM 2025-09-07T06:40:33.5065766Z ++ export PYTORCH_TEST_WITH_ROCM=1 2025-09-07T06:40:33.5066032Z ++ PYTORCH_TEST_WITH_ROCM=1 2025-09-07T06:40:33.5066278Z ++ BUILD_TEST_LIBTORCH=0 2025-09-07T06:40:33.5072456Z ++ dirname .ci/pytorch/test.sh 2025-09-07T06:40:33.5085452Z + source .ci/pytorch/common-build.sh 2025-09-07T06:40:33.5087587Z ++ [[ linux-jammy-rocm-py3.10 != *win-* ]] 2025-09-07T06:40:33.5098957Z ++++ dirname .ci/pytorch/common-build.sh 2025-09-07T06:40:33.5114176Z +++ cd .ci/pytorch 2025-09-07T06:40:33.5114844Z +++ pwd -P 2025-09-07T06:40:33.5118655Z ++ script_dir=/var/lib/jenkins/pytorch/.ci/pytorch 2025-09-07T06:40:33.5119033Z ++ [[ linux-jammy-rocm-py3.10 == *-pch* ]] 2025-09-07T06:40:33.5119692Z ++ which sccache 2025-09-07T06:40:33.5137134Z ++ [[ -z '' ]] 2025-09-07T06:40:33.5137352Z ++ unset SCCACHE_BUCKET 2025-09-07T06:40:33.5137583Z ++ unset SCCACHE_REGION 2025-09-07T06:40:33.5137817Z ++ sccache --stop-server 2025-09-07T06:40:33.5174356Z ++ true 2025-09-07T06:40:33.5174709Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-09-07T06:40:33.5192343Z ++ trap_add sccache_epilogue EXIT 2025-09-07T06:40:33.5192645Z ++ trap_add_cmd=sccache_epilogue 2025-09-07T06:40:33.5192926Z ++ shift 2025-09-07T06:40:33.5193178Z ++ for trap_add_name in "$@" 2025-09-07T06:40:33.5206689Z ++++ trap -p EXIT 2025-09-07T06:40:33.5211496Z +++ eval 'extract_trap_cmd ' 2025-09-07T06:40:33.5211737Z ++++ extract_trap_cmd 2025-09-07T06:40:33.5211952Z ++++ printf '%s\n' '' 2025-09-07T06:40:33.5212464Z +++ printf '%s\n' sccache_epilogue 2025-09-07T06:40:33.5215652Z ++ trap -- ' 2025-09-07T06:40:33.5216186Z sccache_epilogue' EXIT 2025-09-07T06:40:33.5216396Z ++ [[ -n '' ]] 2025-09-07T06:40:33.5216624Z ++ [[ linux-jammy-rocm-py3.10 == *rocm* ]] 2025-09-07T06:40:33.5216957Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-09-07T06:40:33.5217275Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-09-07T06:40:33.5217510Z ++ sccache --start-server 2025-09-07T06:40:33.5241679Z sccache: Starting the server... 2025-09-07T06:40:33.5605215Z sccache: Listening on address 127.0.0.1:4226 2025-09-07T06:40:33.5625404Z ++ sccache --zero-stats 2025-09-07T06:40:33.5655174Z Statistics zeroed. 2025-09-07T06:40:33.5660626Z ++ which ccache 2025-09-07T06:40:33.5677192Z + [[ linux-jammy-rocm-py3.10 != *rocm* ]] 2025-09-07T06:40:33.5677497Z + echo 'Environment variables:' 2025-09-07T06:40:33.5678129Z Environment variables: 2025-09-07T06:40:33.5678354Z + env 2025-09-07T06:40:33.5694156Z GITHUB_WORKSPACE=/var/home/pytorchci/actions-runner/_work/pytorch/pytorch 2025-09-07T06:40:33.5695109Z CONTINUE_THROUGH_ERROR=True 2025-09-07T06:40:33.5695665Z BUILD_ENVIRONMENT=linux-jammy-rocm-py3.10 2025-09-07T06:40:33.5696316Z HOSTNAME=gpu6c07.jax.cs.cpe.ice.amd.com 2025-09-07T06:40:33.5697342Z GITHUB_PATH=/var/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/add_path_571a68a5-eb21-4058-96a4-5f79085b9cf2 2025-09-07T06:40:33.5698273Z GITHUB_ACTION=__self 2025-09-07T06:40:33.5698658Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2025-09-07T06:40:33.5699094Z GITHUB_RUN_NUMBER=17294 2025-09-07T06:40:33.5699444Z TEST_CONFIG=slow 2025-09-07T06:40:33.5699805Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-09-07T06:40:33.5700253Z AWS_DEFAULT_REGION=us-east-1 2025-09-07T06:40:33.5700682Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-09-07T06:40:33.5701138Z GITHUB_REF_TYPE=branch 2025-09-07T06:40:33.5701770Z *** 2025-09-07T06:40:33.5702101Z GITHUB_REPOSITORY_ID=65600975 2025-09-07T06:40:33.5702517Z GITHUB_ACTIONS=true 2025-09-07T06:40:33.5702917Z SHA1=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:40:33.5703464Z GITHUB_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:40:33.5704584Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/slow.yml@refs/heads/main 2025-09-07T06:40:33.5705737Z UCC_HOME=/usr 2025-09-07T06:40:33.5706287Z VERBOSE_TEST_LOGS=False 2025-09-07T06:40:33.5706937Z GITHUB_REF=refs/heads/main 2025-09-07T06:40:33.5707584Z SHARD_NUMBER=1 2025-09-07T06:40:33.5708007Z GITHUB_REF_PROTECTED=true 2025-09-07T06:40:33.5708383Z HOME=/var/lib/jenkins 2025-09-07T06:40:33.5708731Z GITHUB_API_URL=https://api.github.com 2025-09-07T06:40:33.5709155Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-09-07T06:40:33.5709520Z LANG=C.UTF-8 2025-09-07T06:40:33.5709857Z UCX_COMMIT=cc312eaa4655c0cc5c2bcd796db938f90563bcf6 2025-09-07T06:40:33.5710301Z PYTORCH_TEST_WITH_ROCM=1 2025-09-07T06:40:33.5710614Z NUM_TEST_SHARDS=2 2025-09-07T06:40:33.5710893Z UCX_HOME=/usr 2025-09-07T06:40:33.5711627Z GITHUB_STATE=/var/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/save_state_571a68a5-eb21-4058-96a4-5f79085b9cf2 2025-09-07T06:40:33.5712652Z JOB_NAME=linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm) 2025-09-07T06:40:33.5713241Z MAGMA_HOME=/opt/rocm/magma 2025-09-07T06:40:33.5714172Z GITHUB_ENV=/var/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/set_env_571a68a5-eb21-4058-96a4-5f79085b9cf2 2025-09-07T06:40:33.5715201Z GITHUB_EVENT_PATH=/var/home/pytorchci/actions-runner/_work/_temp/_github_workflow/event.json 2025-09-07T06:40:33.5715746Z GITHUB_EVENT_NAME=push 2025-09-07T06:40:33.5716000Z DASHBOARD_TAG= 2025-09-07T06:40:33.5716238Z GITHUB_RUN_ID=17524754569 2025-09-07T06:40:33.5716936Z GITHUB_STEP_SUMMARY=/var/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/step_summary_571a68a5-eb21-4058-96a4-5f79085b9cf2 2025-09-07T06:40:33.5717672Z GITHUB_ACTOR=pytorchmergebot 2025-09-07T06:40:33.5717963Z PR_NUMBER= 2025-09-07T06:40:33.5718190Z GITHUB_RUN_ATTEMPT=1 2025-09-07T06:40:33.5718447Z ANACONDA_PYTHON_VERSION=3.10 2025-09-07T06:40:33.5718791Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-09-07T06:40:33.5719142Z TERM=vt100 2025-09-07T06:40:33.5719361Z INSTALLED_VISION=yes 2025-09-07T06:40:33.5719601Z BRANCH=main 2025-09-07T06:40:33.5719832Z OPENSSL_ROOT_DIR=/opt/openssl 2025-09-07T06:40:33.5720111Z TESTS_TO_INCLUDE= 2025-09-07T06:40:33.5720637Z GITHUB_ACTION_PATH=/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-rocm 2025-09-07T06:40:33.5721252Z GITHUB_SERVER_URL=https://github.com 2025-09-07T06:40:33.5721580Z PYTORCH_ROCM_ARCH=gfx90a;gfx942 2025-09-07T06:40:33.5721910Z UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d 2025-09-07T06:40:33.5722245Z REENABLED_ISSUES= 2025-09-07T06:40:33.5722476Z SHLVL=1 2025-09-07T06:40:33.5722679Z MAX_JOBS=126 2025-09-07T06:40:33.5722902Z GITHUB_ACTOR_ID=97764156 2025-09-07T06:40:33.5723411Z GITHUB_WORKFLOW_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:40:33.5723795Z GITHUB_REF_NAME=main 2025-09-07T06:40:33.5724046Z ROCM_PATH=/opt/rocm 2025-09-07T06:40:33.5724279Z GITHUB_JOB=test 2025-09-07T06:40:33.5724515Z NO_TEST_TIMEOUT=False 2025-09-07T06:40:33.5724799Z GITHUB_REPOSITORY=pytorch/pytorch 2025-09-07T06:40:33.5725083Z LC_ALL=C.UTF-8 2025-09-07T06:40:33.5725285Z GITHUB_RETENTION_DAYS=90 2025-09-07T06:40:33.5725519Z OPENSSL_DIR=/opt/openssl 2025-09-07T06:40:33.5725751Z GITHUB_ACTION_REPOSITORY= 2025-09-07T06:40:33.5726558Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T06:40:33.5727379Z GITHUB_BASE_REF= 2025-09-07T06:40:33.5727569Z CI=true 2025-09-07T06:40:33.5727766Z GITHUB_REPOSITORY_OWNER=pytorch 2025-09-07T06:40:33.5728002Z JOB_ID=49774352868 2025-09-07T06:40:33.5728196Z GITHUB_HEAD_REF= 2025-09-07T06:40:33.5728398Z GITHUB_ACTION_REF= 2025-09-07T06:40:33.5728595Z TEST_SHOWLOCALS=False 2025-09-07T06:40:33.5728817Z GITHUB_WORKFLOW=slow 2025-09-07T06:40:33.5729042Z DEBIAN_FRONTEND=noninteractive 2025-09-07T06:40:33.5729598Z GITHUB_OUTPUT=/var/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/set_output_571a68a5-eb21-4058-96a4-5f79085b9cf2 2025-09-07T06:40:33.5730150Z NO_TD=False 2025-09-07T06:40:33.5730341Z OLDPWD=/var/lib/jenkins 2025-09-07T06:40:33.5730551Z _=/usr/bin/env 2025-09-07T06:40:33.5730832Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-09-07T06:40:33.5842043Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch 2025-09-07T06:40:33.5842917Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-09-07T06:40:33.5843747Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib 2025-09-07T06:40:33.5844720Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test 2025-09-07T06:40:33.5845508Z + BUILD_DIR=build 2025-09-07T06:40:33.5845935Z + BUILD_RENAMED_DIR=build_renamed 2025-09-07T06:40:33.5846430Z + BUILD_BIN_DIR=build/bin 2025-09-07T06:40:33.5846859Z + SHARD_NUMBER=1 2025-09-07T06:40:33.5847240Z + NUM_TEST_SHARDS=2 2025-09-07T06:40:33.5847677Z + export TORCH_SERIALIZATION_DEBUG=1 2025-09-07T06:40:33.5848204Z + TORCH_SERIALIZATION_DEBUG=1 2025-09-07T06:40:33.5849002Z + export VALGRIND=ON 2025-09-07T06:40:33.5849362Z + VALGRIND=ON 2025-09-07T06:40:33.5849727Z + [[ linux-jammy-rocm-py3.10 == *clang9* ]] 2025-09-07T06:40:33.5850224Z + [[ linux-jammy-rocm-py3.10 == *xpu* ]] 2025-09-07T06:40:33.5850658Z + detect_cuda_arch 2025-09-07T06:40:33.5851014Z + [[ linux-jammy-rocm-py3.10 == *cuda* ]] 2025-09-07T06:40:33.5851488Z + [[ linux-jammy-rocm-py3.10 == *s390x* ]] 2025-09-07T06:40:33.5851907Z + [[ 0 == \1 ]] 2025-09-07T06:40:33.5852214Z + [[ True == \1 ]] 2025-09-07T06:40:33.5852570Z + [[ linux-jammy-rocm-py3.10 != *bazel* ]] 2025-09-07T06:40:33.5853050Z ++ realpath build/custom_test_artifacts 2025-09-07T06:40:33.5866080Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/pytorch/build/custom_test_artifacts 2025-09-07T06:40:33.5866506Z + [[ -n '' ]] 2025-09-07T06:40:33.5866716Z + echo 'Environment variables' 2025-09-07T06:40:33.5866954Z Environment variables 2025-09-07T06:40:33.5867158Z + env 2025-09-07T06:40:33.5878262Z GITHUB_WORKSPACE=/var/home/pytorchci/actions-runner/_work/pytorch/pytorch 2025-09-07T06:40:33.5878660Z CONTINUE_THROUGH_ERROR=True 2025-09-07T06:40:33.5878930Z BUILD_ENVIRONMENT=linux-jammy-rocm-py3.10 2025-09-07T06:40:33.5879273Z HOSTNAME=gpu6c07.jax.cs.cpe.ice.amd.com 2025-09-07T06:40:33.5879861Z GITHUB_PATH=/var/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/add_path_571a68a5-eb21-4058-96a4-5f79085b9cf2 2025-09-07T06:40:33.5880425Z GITHUB_ACTION=__self 2025-09-07T06:40:33.5880657Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2025-09-07T06:40:33.5880921Z GITHUB_RUN_NUMBER=17294 2025-09-07T06:40:33.5881151Z TEST_CONFIG=slow 2025-09-07T06:40:33.5881555Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-09-07T06:40:33.5881830Z AWS_DEFAULT_REGION=us-east-1 2025-09-07T06:40:33.5882088Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-09-07T06:40:33.5882369Z GITHUB_REF_TYPE=branch 2025-09-07T06:40:33.5882614Z *** 2025-09-07T06:40:33.5882816Z GITHUB_REPOSITORY_ID=65600975 2025-09-07T06:40:33.5883049Z GITHUB_ACTIONS=true 2025-09-07T06:40:33.5883288Z SHA1=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:40:33.5883607Z GITHUB_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:40:33.5884054Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/slow.yml@refs/heads/main 2025-09-07T06:40:33.5884468Z UCC_HOME=/usr 2025-09-07T06:40:33.5884672Z TORCH_SERIALIZATION_DEBUG=1 2025-09-07T06:40:33.5884913Z VERBOSE_TEST_LOGS=False 2025-09-07T06:40:33.5885145Z GITHUB_REF=refs/heads/main 2025-09-07T06:40:33.5885377Z SHARD_NUMBER=1 2025-09-07T06:40:33.5885584Z GITHUB_REF_PROTECTED=true 2025-09-07T06:40:33.5885814Z HOME=/var/lib/jenkins 2025-09-07T06:40:33.5886055Z GITHUB_API_URL=https://api.github.com 2025-09-07T06:40:33.5886349Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-09-07T06:40:33.5886607Z LANG=C.UTF-8 2025-09-07T06:40:33.5886849Z UCX_COMMIT=cc312eaa4655c0cc5c2bcd796db938f90563bcf6 2025-09-07T06:40:33.5887163Z PYTORCH_TEST_WITH_ROCM=1 2025-09-07T06:40:33.5887387Z NUM_TEST_SHARDS=2 2025-09-07T06:40:33.5887591Z UCX_HOME=/usr 2025-09-07T06:40:33.5888129Z GITHUB_STATE=/var/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/save_state_571a68a5-eb21-4058-96a4-5f79085b9cf2 2025-09-07T06:40:33.5888871Z JOB_NAME=linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm) 2025-09-07T06:40:33.5889297Z MAGMA_HOME=/opt/rocm/magma 2025-09-07T06:40:33.5889841Z GITHUB_ENV=/var/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/set_env_571a68a5-eb21-4058-96a4-5f79085b9cf2 2025-09-07T06:40:33.5890612Z GITHUB_EVENT_PATH=/var/home/pytorchci/actions-runner/_work/_temp/_github_workflow/event.json 2025-09-07T06:40:33.5891074Z GITHUB_EVENT_NAME=push 2025-09-07T06:40:33.5891299Z DASHBOARD_TAG= 2025-09-07T06:40:33.5891509Z GITHUB_RUN_ID=17524754569 2025-09-07T06:40:33.5892092Z GITHUB_STEP_SUMMARY=/var/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/step_summary_571a68a5-eb21-4058-96a4-5f79085b9cf2 2025-09-07T06:40:33.5892730Z GITHUB_ACTOR=pytorchmergebot 2025-09-07T06:40:33.5892974Z PR_NUMBER= 2025-09-07T06:40:33.5893170Z GITHUB_RUN_ATTEMPT=1 2025-09-07T06:40:33.5893535Z VALGRIND=ON 2025-09-07T06:40:33.5893747Z ANACONDA_PYTHON_VERSION=3.10 2025-09-07T06:40:33.5894126Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-09-07T06:40:33.5894440Z TERM=vt100 2025-09-07T06:40:33.5894626Z INSTALLED_VISION=yes 2025-09-07T06:40:33.5894842Z BRANCH=main 2025-09-07T06:40:33.5895044Z OPENSSL_ROOT_DIR=/opt/openssl 2025-09-07T06:40:33.5895287Z TESTS_TO_INCLUDE= 2025-09-07T06:40:33.5895731Z GITHUB_ACTION_PATH=/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-rocm 2025-09-07T06:40:33.5896233Z GITHUB_SERVER_URL=https://github.com 2025-09-07T06:40:33.5896509Z PYTORCH_ROCM_ARCH=gfx90a;gfx942 2025-09-07T06:40:33.5896797Z UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d 2025-09-07T06:40:33.5897085Z REENABLED_ISSUES= 2025-09-07T06:40:33.5897304Z SHLVL=1 2025-09-07T06:40:33.5897480Z MAX_JOBS=126 2025-09-07T06:40:33.5897719Z GITHUB_ACTOR_ID=97764156 2025-09-07T06:40:33.5898011Z GITHUB_WORKFLOW_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:40:33.5898344Z GITHUB_REF_NAME=main 2025-09-07T06:40:33.5898559Z ROCM_PATH=/opt/rocm 2025-09-07T06:40:33.5898770Z GITHUB_JOB=test 2025-09-07T06:40:33.5898972Z NO_TEST_TIMEOUT=False 2025-09-07T06:40:33.5899214Z GITHUB_REPOSITORY=pytorch/pytorch 2025-09-07T06:40:33.5899474Z LC_ALL=C.UTF-8 2025-09-07T06:40:33.5899680Z GITHUB_RETENTION_DAYS=90 2025-09-07T06:40:33.5899910Z OPENSSL_DIR=/opt/openssl 2025-09-07T06:40:33.5900140Z GITHUB_ACTION_REPOSITORY= 2025-09-07T06:40:33.5900948Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T06:40:33.5901962Z GITHUB_BASE_REF= 2025-09-07T06:40:33.5902157Z CI=true 2025-09-07T06:40:33.5902357Z GITHUB_REPOSITORY_OWNER=pytorch 2025-09-07T06:40:33.5902617Z JOB_ID=49774352868 2025-09-07T06:40:33.5902818Z GITHUB_HEAD_REF= 2025-09-07T06:40:33.5903024Z GITHUB_ACTION_REF= 2025-09-07T06:40:33.5903231Z TEST_SHOWLOCALS=False 2025-09-07T06:40:33.5903445Z GITHUB_WORKFLOW=slow 2025-09-07T06:40:33.5903667Z DEBIAN_FRONTEND=noninteractive 2025-09-07T06:40:33.5904227Z GITHUB_OUTPUT=/var/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/set_output_571a68a5-eb21-4058-96a4-5f79085b9cf2 2025-09-07T06:40:33.5904788Z NO_TD=False 2025-09-07T06:40:33.5904973Z OLDPWD=/var/lib/jenkins 2025-09-07T06:40:33.5905184Z _=/usr/bin/env 2025-09-07T06:40:33.5905388Z + echo 'Testing pytorch' 2025-09-07T06:40:33.5905612Z Testing pytorch 2025-09-07T06:40:33.5905814Z + export LANG=C.UTF-8 2025-09-07T06:40:33.5906020Z + LANG=C.UTF-8 2025-09-07T06:40:33.5906214Z + PR_NUMBER= 2025-09-07T06:40:33.5906409Z + [[ slow == \d\e\f\a\u\l\t ]] 2025-09-07T06:40:33.5906657Z + [[ slow == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-09-07T06:40:33.5906908Z + [[ slow == \s\l\o\w ]] 2025-09-07T06:40:33.5907142Z + export PYTORCH_TEST_WITH_SLOW=1 2025-09-07T06:40:33.5907405Z + PYTORCH_TEST_WITH_SLOW=1 2025-09-07T06:40:33.5907647Z + export PYTORCH_TEST_SKIP_FAST=1 2025-09-07T06:40:33.5907905Z + PYTORCH_TEST_SKIP_FAST=1 2025-09-07T06:40:33.5908183Z + [[ linux-jammy-rocm-py3.10 == *slow-gradcheck* ]] 2025-09-07T06:40:33.5908505Z + [[ linux-jammy-rocm-py3.10 == *cuda* ]] 2025-09-07T06:40:33.5908792Z + [[ linux-jammy-rocm-py3.10 == *rocm* ]] 2025-09-07T06:40:33.5909085Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-09-07T06:40:33.5909383Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-09-07T06:40:33.5909651Z + [[ slow == *crossref* ]] 2025-09-07T06:40:33.5909885Z + [[ linux-jammy-rocm-py3.10 == *rocm* ]] 2025-09-07T06:40:33.5910152Z + export VALGRIND=OFF 2025-09-07T06:40:33.5910366Z + VALGRIND=OFF 2025-09-07T06:40:33.5910560Z + rocminfo 2025-09-07T06:40:33.6021276Z ROCk module version 6.10.5 is loaded 2025-09-07T06:40:33.8091627Z ===================== 2025-09-07T06:40:33.8092311Z HSA System Attributes 2025-09-07T06:40:33.8092960Z ===================== 2025-09-07T06:40:33.8093446Z Runtime Version: 1.15 2025-09-07T06:40:33.8094529Z Runtime Ext Version: 1.7 2025-09-07T06:40:33.8095319Z System Timestamp Freq.: 1000.000000MHz 2025-09-07T06:40:33.8096047Z Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 2025-09-07T06:40:33.8096844Z Machine Model: LARGE 2025-09-07T06:40:33.8097542Z System Endianness: LITTLE 2025-09-07T06:40:33.8098103Z Mwaitx: DISABLED 2025-09-07T06:40:33.8098676Z XNACK enabled: NO 2025-09-07T06:40:33.8099288Z DMAbuf Support: YES 2025-09-07T06:40:33.8099912Z VMM Support: YES 2025-09-07T06:40:33.8100313Z 2025-09-07T06:40:33.8100538Z ========== 2025-09-07T06:40:33.8101099Z HSA Agents 2025-09-07T06:40:33.8101669Z ========== 2025-09-07T06:40:33.8102200Z ******* 2025-09-07T06:40:33.8102593Z Agent 1 2025-09-07T06:40:33.8102895Z ******* 2025-09-07T06:40:33.8103363Z Name: AMD EPYC 7713 64-Core Processor 2025-09-07T06:40:33.8103803Z Uuid: CPU-XX 2025-09-07T06:40:33.8104261Z Marketing Name: AMD EPYC 7713 64-Core Processor 2025-09-07T06:40:33.8104725Z Vendor Name: CPU 2025-09-07T06:40:33.8105165Z Feature: None specified 2025-09-07T06:40:33.8105608Z Profile: FULL_PROFILE 2025-09-07T06:40:33.8106055Z Float Round Mode: NEAR 2025-09-07T06:40:33.8106689Z Max Queue Number: 0(0x0) 2025-09-07T06:40:33.8107140Z Queue Min Size: 0(0x0) 2025-09-07T06:40:33.8107568Z Queue Max Size: 0(0x0) 2025-09-07T06:40:33.8108003Z Queue Type: MULTI 2025-09-07T06:40:33.8108425Z Node: 0 2025-09-07T06:40:33.8108867Z Device Type: CPU 2025-09-07T06:40:33.8109258Z Cache Info: 2025-09-07T06:40:33.8109582Z L1: 32768(0x8000) KB 2025-09-07T06:40:33.8109934Z Chip ID: 0(0x0) 2025-09-07T06:40:33.8110308Z ASIC Revision: 0(0x0) 2025-09-07T06:40:33.8110699Z Cacheline Size: 64(0x40) 2025-09-07T06:40:33.8111094Z Max Clock Freq. (MHz): 2000 2025-09-07T06:40:33.8111477Z BDFID: 0 2025-09-07T06:40:33.8111843Z Internal Node ID: 0 2025-09-07T06:40:33.8112227Z Compute Unit: 64 2025-09-07T06:40:33.8112617Z SIMDs per CU: 0 2025-09-07T06:40:33.8113001Z Shader Engines: 0 2025-09-07T06:40:33.8113398Z Shader Arrs. per Eng.: 0 2025-09-07T06:40:33.8113807Z WatchPts on Addr. Ranges:1 2025-09-07T06:40:33.8114159Z Memory Properties: 2025-09-07T06:40:33.8114431Z Features: None 2025-09-07T06:40:33.8114707Z Pool Info: 2025-09-07T06:40:33.8114961Z Pool 1 2025-09-07T06:40:33.8115279Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:33.8115674Z Size: 528249852(0x1f7c73fc) KB 2025-09-07T06:40:33.8116052Z Allocatable: TRUE 2025-09-07T06:40:33.8116447Z Alloc Granule: 4KB 2025-09-07T06:40:33.8116866Z Alloc Recommended Granule:4KB 2025-09-07T06:40:33.8117418Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8117823Z Accessible by all: TRUE 2025-09-07T06:40:33.8118170Z Pool 2 2025-09-07T06:40:33.8118482Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:33.8118870Z Size: 528249852(0x1f7c73fc) KB 2025-09-07T06:40:33.8119233Z Allocatable: TRUE 2025-09-07T06:40:33.8119573Z Alloc Granule: 4KB 2025-09-07T06:40:33.8119938Z Alloc Recommended Granule:4KB 2025-09-07T06:40:33.8120297Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8120647Z Accessible by all: TRUE 2025-09-07T06:40:33.8120958Z Pool 3 2025-09-07T06:40:33.8121240Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-09-07T06:40:33.8121571Z Size: 528249852(0x1f7c73fc) KB 2025-09-07T06:40:33.8121900Z Allocatable: TRUE 2025-09-07T06:40:33.8122252Z Alloc Granule: 4KB 2025-09-07T06:40:33.8122608Z Alloc Recommended Granule:4KB 2025-09-07T06:40:33.8122965Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8123328Z Accessible by all: TRUE 2025-09-07T06:40:33.8123759Z Pool 4 2025-09-07T06:40:33.8124037Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:33.8124359Z Size: 528249852(0x1f7c73fc) KB 2025-09-07T06:40:33.8124845Z Allocatable: TRUE 2025-09-07T06:40:33.8125194Z Alloc Granule: 4KB 2025-09-07T06:40:33.8125550Z Alloc Recommended Granule:4KB 2025-09-07T06:40:33.8125910Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8126261Z Accessible by all: TRUE 2025-09-07T06:40:33.8126564Z ISA Info: 2025-09-07T06:40:33.8126790Z ******* 2025-09-07T06:40:33.8127010Z Agent 2 2025-09-07T06:40:33.8127227Z ******* 2025-09-07T06:40:33.8127494Z Name: AMD EPYC 7713 64-Core Processor 2025-09-07T06:40:33.8127824Z Uuid: CPU-XX 2025-09-07T06:40:33.8128168Z Marketing Name: AMD EPYC 7713 64-Core Processor 2025-09-07T06:40:33.8128522Z Vendor Name: CPU 2025-09-07T06:40:33.8128859Z Feature: None specified 2025-09-07T06:40:33.8129195Z Profile: FULL_PROFILE 2025-09-07T06:40:33.8129542Z Float Round Mode: NEAR 2025-09-07T06:40:33.8129888Z Max Queue Number: 0(0x0) 2025-09-07T06:40:33.8130227Z Queue Min Size: 0(0x0) 2025-09-07T06:40:33.8130557Z Queue Max Size: 0(0x0) 2025-09-07T06:40:33.8130884Z Queue Type: MULTI 2025-09-07T06:40:33.8131188Z Node: 1 2025-09-07T06:40:33.8131507Z Device Type: CPU 2025-09-07T06:40:33.8131803Z Cache Info: 2025-09-07T06:40:33.8132052Z L1: 32768(0x8000) KB 2025-09-07T06:40:33.8132345Z Chip ID: 0(0x0) 2025-09-07T06:40:33.8132668Z ASIC Revision: 0(0x0) 2025-09-07T06:40:33.8133138Z Cacheline Size: 64(0x40) 2025-09-07T06:40:33.8133484Z Max Clock Freq. (MHz): 2000 2025-09-07T06:40:33.8133822Z BDFID: 0 2025-09-07T06:40:33.8134239Z Internal Node ID: 1 2025-09-07T06:40:33.8134598Z Compute Unit: 64 2025-09-07T06:40:33.8134938Z SIMDs per CU: 0 2025-09-07T06:40:33.8135280Z Shader Engines: 0 2025-09-07T06:40:33.8135633Z Shader Arrs. per Eng.: 0 2025-09-07T06:40:33.8148971Z WatchPts on Addr. Ranges:1 2025-09-07T06:40:33.8149346Z Memory Properties: 2025-09-07T06:40:33.8149611Z Features: None 2025-09-07T06:40:33.8149873Z Pool Info: 2025-09-07T06:40:33.8150126Z Pool 1 2025-09-07T06:40:33.8150419Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:33.8150784Z Size: 528402464(0x1f7ec820) KB 2025-09-07T06:40:33.8151138Z Allocatable: TRUE 2025-09-07T06:40:33.8151494Z Alloc Granule: 4KB 2025-09-07T06:40:33.8151869Z Alloc Recommended Granule:4KB 2025-09-07T06:40:33.8152240Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8152822Z Accessible by all: TRUE 2025-09-07T06:40:33.8153135Z Pool 2 2025-09-07T06:40:33.8153432Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:33.8153779Z Size: 528402464(0x1f7ec820) KB 2025-09-07T06:40:33.8154123Z Allocatable: TRUE 2025-09-07T06:40:33.8154473Z Alloc Granule: 4KB 2025-09-07T06:40:33.8154828Z Alloc Recommended Granule:4KB 2025-09-07T06:40:33.8155194Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8155536Z Accessible by all: TRUE 2025-09-07T06:40:33.8155845Z Pool 3 2025-09-07T06:40:33.8156129Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-09-07T06:40:33.8156457Z Size: 528402464(0x1f7ec820) KB 2025-09-07T06:40:33.8156791Z Allocatable: TRUE 2025-09-07T06:40:33.8157130Z Alloc Granule: 4KB 2025-09-07T06:40:33.8157489Z Alloc Recommended Granule:4KB 2025-09-07T06:40:33.8157851Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8158203Z Accessible by all: TRUE 2025-09-07T06:40:33.8158521Z Pool 4 2025-09-07T06:40:33.8158801Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:33.8159134Z Size: 528402464(0x1f7ec820) KB 2025-09-07T06:40:33.8159471Z Allocatable: TRUE 2025-09-07T06:40:33.8159819Z Alloc Granule: 4KB 2025-09-07T06:40:33.8160172Z Alloc Recommended Granule:4KB 2025-09-07T06:40:33.8160543Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8160895Z Accessible by all: TRUE 2025-09-07T06:40:33.8161206Z ISA Info: 2025-09-07T06:40:33.8161437Z ******* 2025-09-07T06:40:33.8161819Z Agent 3 2025-09-07T06:40:33.8162044Z ******* 2025-09-07T06:40:33.8162296Z Name: gfx90a 2025-09-07T06:40:33.8162631Z Uuid: GPU-fcde9f1dc11080c7 2025-09-07T06:40:33.8162987Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:33.8163340Z Vendor Name: AMD 2025-09-07T06:40:33.8163668Z Feature: KERNEL_DISPATCH 2025-09-07T06:40:33.8163993Z Profile: BASE_PROFILE 2025-09-07T06:40:33.8164341Z Float Round Mode: NEAR 2025-09-07T06:40:33.8164691Z Max Queue Number: 128(0x80) 2025-09-07T06:40:33.8165036Z Queue Min Size: 64(0x40) 2025-09-07T06:40:33.8165383Z Queue Max Size: 131072(0x20000) 2025-09-07T06:40:33.8165742Z Queue Type: MULTI 2025-09-07T06:40:33.8166062Z Node: 2 2025-09-07T06:40:33.8166379Z Device Type: GPU 2025-09-07T06:40:33.8166672Z Cache Info: 2025-09-07T06:40:33.8166919Z L1: 16(0x10) KB 2025-09-07T06:40:33.8167216Z L2: 8192(0x2000) KB 2025-09-07T06:40:33.8167513Z Chip ID: 29708(0x740c) 2025-09-07T06:40:33.8167841Z ASIC Revision: 1(0x1) 2025-09-07T06:40:33.8178553Z Cacheline Size: 128(0x80) 2025-09-07T06:40:33.8178897Z Max Clock Freq. (MHz): 1700 2025-09-07T06:40:33.8179217Z BDFID: 12800 2025-09-07T06:40:33.8179542Z Internal Node ID: 2 2025-09-07T06:40:33.8179876Z Compute Unit: 104 2025-09-07T06:40:33.8180212Z SIMDs per CU: 4 2025-09-07T06:40:33.8180543Z Shader Engines: 8 2025-09-07T06:40:33.8180883Z Shader Arrs. per Eng.: 1 2025-09-07T06:40:33.8181236Z WatchPts on Addr. Ranges:4 2025-09-07T06:40:33.8181589Z Coherent Host Access: FALSE 2025-09-07T06:40:33.8181896Z Memory Properties: 2025-09-07T06:40:33.8182158Z Features: KERNEL_DISPATCH 2025-09-07T06:40:33.8182477Z Fast F16 Operation: TRUE 2025-09-07T06:40:33.8182830Z Wavefront Size: 64(0x40) 2025-09-07T06:40:33.8183174Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8183503Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8183781Z x 1024(0x400) 2025-09-07T06:40:33.8184058Z y 1024(0x400) 2025-09-07T06:40:33.8184334Z z 1024(0x400) 2025-09-07T06:40:33.8184649Z Max Waves Per CU: 32(0x20) 2025-09-07T06:40:33.8185004Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:40:33.8185352Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8185659Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8185910Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8186200Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8186483Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8186807Z Max fbarriers/Workgrp: 32 2025-09-07T06:40:33.8187363Z Packet Processor uCode:: 92 2025-09-07T06:40:33.8187730Z SDMA engine uCode:: 9 2025-09-07T06:40:33.8188065Z IOMMU Support:: None 2025-09-07T06:40:33.8188366Z Pool Info: 2025-09-07T06:40:33.8188604Z Pool 1 2025-09-07T06:40:33.8188903Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:33.8189246Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8189574Z Allocatable: TRUE 2025-09-07T06:40:33.8189922Z Alloc Granule: 4KB 2025-09-07T06:40:33.8190274Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8190638Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8190993Z Accessible by all: FALSE 2025-09-07T06:40:33.8191295Z Pool 2 2025-09-07T06:40:33.8191586Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:33.8191920Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8192254Z Allocatable: TRUE 2025-09-07T06:40:33.8192600Z Alloc Granule: 4KB 2025-09-07T06:40:33.8192962Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8193322Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8193835Z Accessible by all: FALSE 2025-09-07T06:40:33.8194134Z Pool 3 2025-09-07T06:40:33.8194407Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:33.8194724Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8195037Z Allocatable: TRUE 2025-09-07T06:40:33.8195379Z Alloc Granule: 4KB 2025-09-07T06:40:33.8195736Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8196086Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8196435Z Accessible by all: FALSE 2025-09-07T06:40:33.8196734Z Pool 4 2025-09-07T06:40:33.8196991Z Segment: GROUP 2025-09-07T06:40:33.8197303Z Size: 64(0x40) KB 2025-09-07T06:40:33.8197618Z Allocatable: FALSE 2025-09-07T06:40:33.8197957Z Alloc Granule: 0KB 2025-09-07T06:40:33.8198310Z Alloc Recommended Granule:0KB 2025-09-07T06:40:33.8198673Z Alloc Alignment: 0KB 2025-09-07T06:40:33.8199007Z Accessible by all: FALSE 2025-09-07T06:40:33.8199297Z ISA Info: 2025-09-07T06:40:33.8199505Z ISA 1 2025-09-07T06:40:33.8199776Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:40:33.8200139Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:40:33.8200483Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:40:33.8200827Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8201180Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8201502Z Fast f16: TRUE 2025-09-07T06:40:33.8201826Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8202135Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8202546Z x 1024(0x400) 2025-09-07T06:40:33.8202834Z y 1024(0x400) 2025-09-07T06:40:33.8203103Z z 1024(0x400) 2025-09-07T06:40:33.8203399Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8203695Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8203944Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8204215Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8204493Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8204802Z FBarrier Max Size: 32 2025-09-07T06:40:33.8205089Z ******* 2025-09-07T06:40:33.8205306Z Agent 4 2025-09-07T06:40:33.8205504Z ******* 2025-09-07T06:40:33.8205747Z Name: gfx90a 2025-09-07T06:40:33.8206053Z Uuid: GPU-58e51c85c53e7e04 2025-09-07T06:40:33.8206390Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:33.8206726Z Vendor Name: AMD 2025-09-07T06:40:33.8207047Z Feature: KERNEL_DISPATCH 2025-09-07T06:40:33.8207362Z Profile: BASE_PROFILE 2025-09-07T06:40:33.8207687Z Float Round Mode: NEAR 2025-09-07T06:40:33.8208160Z Max Queue Number: 128(0x80) 2025-09-07T06:40:33.8208479Z Queue Min Size: 64(0x40) 2025-09-07T06:40:33.8208801Z Queue Max Size: 131072(0x20000) 2025-09-07T06:40:33.8209120Z Queue Type: MULTI 2025-09-07T06:40:33.8209420Z Node: 3 2025-09-07T06:40:33.8209717Z Device Type: GPU 2025-09-07T06:40:33.8209999Z Cache Info: 2025-09-07T06:40:33.8210233Z L1: 16(0x10) KB 2025-09-07T06:40:33.8210520Z L2: 8192(0x2000) KB 2025-09-07T06:40:33.8210805Z Chip ID: 29708(0x740c) 2025-09-07T06:40:33.8211118Z ASIC Revision: 1(0x1) 2025-09-07T06:40:33.8211450Z Cacheline Size: 128(0x80) 2025-09-07T06:40:33.8211771Z Max Clock Freq. (MHz): 1700 2025-09-07T06:40:33.8212083Z BDFID: 13568 2025-09-07T06:40:33.8212394Z Internal Node ID: 3 2025-09-07T06:40:33.8212719Z Compute Unit: 104 2025-09-07T06:40:33.8213038Z SIMDs per CU: 4 2025-09-07T06:40:33.8213367Z Shader Engines: 8 2025-09-07T06:40:33.8213699Z Shader Arrs. per Eng.: 1 2025-09-07T06:40:33.8214111Z WatchPts on Addr. Ranges:4 2025-09-07T06:40:33.8214465Z Coherent Host Access: FALSE 2025-09-07T06:40:33.8214767Z Memory Properties: 2025-09-07T06:40:33.8215017Z Features: KERNEL_DISPATCH 2025-09-07T06:40:33.8215326Z Fast F16 Operation: TRUE 2025-09-07T06:40:33.8215681Z Wavefront Size: 64(0x40) 2025-09-07T06:40:33.8216035Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8216359Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8216813Z x 1024(0x400) 2025-09-07T06:40:33.8217107Z y 1024(0x400) 2025-09-07T06:40:33.8217388Z z 1024(0x400) 2025-09-07T06:40:33.8217698Z Max Waves Per CU: 32(0x20) 2025-09-07T06:40:33.8218045Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:40:33.8218391Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8218704Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8218963Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8219250Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8219533Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8219854Z Max fbarriers/Workgrp: 32 2025-09-07T06:40:33.8220222Z Packet Processor uCode:: 92 2025-09-07T06:40:33.8220585Z SDMA engine uCode:: 9 2025-09-07T06:40:33.8220928Z IOMMU Support:: None 2025-09-07T06:40:33.8221228Z Pool Info: 2025-09-07T06:40:33.8221453Z Pool 1 2025-09-07T06:40:33.8221740Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:33.8222072Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8222411Z Allocatable: TRUE 2025-09-07T06:40:33.8222753Z Alloc Granule: 4KB 2025-09-07T06:40:33.8223453Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8223832Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8224183Z Accessible by all: FALSE 2025-09-07T06:40:33.8224483Z Pool 2 2025-09-07T06:40:33.8224767Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:33.8225097Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8225419Z Allocatable: TRUE 2025-09-07T06:40:33.8225764Z Alloc Granule: 4KB 2025-09-07T06:40:33.8226123Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8226483Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8226837Z Accessible by all: FALSE 2025-09-07T06:40:33.8227135Z Pool 3 2025-09-07T06:40:33.8227408Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:33.8227731Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8228059Z Allocatable: TRUE 2025-09-07T06:40:33.8228406Z Alloc Granule: 4KB 2025-09-07T06:40:33.8228767Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8229126Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8229475Z Accessible by all: FALSE 2025-09-07T06:40:33.8229777Z Pool 4 2025-09-07T06:40:33.8230043Z Segment: GROUP 2025-09-07T06:40:33.8230359Z Size: 64(0x40) KB 2025-09-07T06:40:33.8230689Z Allocatable: FALSE 2025-09-07T06:40:33.8231042Z Alloc Granule: 0KB 2025-09-07T06:40:33.8231396Z Alloc Recommended Granule:0KB 2025-09-07T06:40:33.8231760Z Alloc Alignment: 0KB 2025-09-07T06:40:33.8232244Z Accessible by all: FALSE 2025-09-07T06:40:33.8232564Z ISA Info: 2025-09-07T06:40:33.8232792Z ISA 1 2025-09-07T06:40:33.8233084Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:40:33.8233473Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:40:33.8233846Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:40:33.8234207Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8234575Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8234909Z Fast f16: TRUE 2025-09-07T06:40:33.8235244Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8235565Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8235866Z x 1024(0x400) 2025-09-07T06:40:33.8236149Z y 1024(0x400) 2025-09-07T06:40:33.8236436Z z 1024(0x400) 2025-09-07T06:40:33.8236750Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8237063Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8237333Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8237627Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8237920Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8238382Z FBarrier Max Size: 32 2025-09-07T06:40:33.8238684Z ******* 2025-09-07T06:40:33.8238907Z Agent 5 2025-09-07T06:40:33.8239119Z ******* 2025-09-07T06:40:33.8239369Z Name: gfx90a 2025-09-07T06:40:33.8239685Z Uuid: GPU-4add128351c0dde4 2025-09-07T06:40:33.8240030Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:33.8240379Z Vendor Name: AMD 2025-09-07T06:40:33.8240712Z Feature: KERNEL_DISPATCH 2025-09-07T06:40:33.8241050Z Profile: BASE_PROFILE 2025-09-07T06:40:33.8241388Z Float Round Mode: NEAR 2025-09-07T06:40:33.8241731Z Max Queue Number: 128(0x80) 2025-09-07T06:40:33.8242078Z Queue Min Size: 64(0x40) 2025-09-07T06:40:33.8242415Z Queue Max Size: 131072(0x20000) 2025-09-07T06:40:33.8242748Z Queue Type: MULTI 2025-09-07T06:40:33.8243061Z Node: 4 2025-09-07T06:40:33.8243381Z Device Type: GPU 2025-09-07T06:40:33.8243679Z Cache Info: 2025-09-07T06:40:33.8243933Z L1: 16(0x10) KB 2025-09-07T06:40:33.8244226Z L2: 8192(0x2000) KB 2025-09-07T06:40:33.8244531Z Chip ID: 29708(0x740c) 2025-09-07T06:40:33.8244857Z ASIC Revision: 1(0x1) 2025-09-07T06:40:33.8245200Z Cacheline Size: 128(0x80) 2025-09-07T06:40:33.8245552Z Max Clock Freq. (MHz): 1700 2025-09-07T06:40:33.8245872Z BDFID: 4352 2025-09-07T06:40:33.8246200Z Internal Node ID: 4 2025-09-07T06:40:33.8246544Z Compute Unit: 104 2025-09-07T06:40:33.8247009Z SIMDs per CU: 4 2025-09-07T06:40:33.8247363Z Shader Engines: 8 2025-09-07T06:40:33.8247714Z Shader Arrs. per Eng.: 1 2025-09-07T06:40:33.8248073Z WatchPts on Addr. Ranges:4 2025-09-07T06:40:33.8248434Z Coherent Host Access: FALSE 2025-09-07T06:40:33.8248752Z Memory Properties: 2025-09-07T06:40:33.8248996Z Features: KERNEL_DISPATCH 2025-09-07T06:40:33.8249319Z Fast F16 Operation: TRUE 2025-09-07T06:40:33.8249673Z Wavefront Size: 64(0x40) 2025-09-07T06:40:33.8250018Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8250335Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8250610Z x 1024(0x400) 2025-09-07T06:40:33.8250897Z y 1024(0x400) 2025-09-07T06:40:33.8251178Z z 1024(0x400) 2025-09-07T06:40:33.8251483Z Max Waves Per CU: 32(0x20) 2025-09-07T06:40:33.8251828Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:40:33.8252169Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8252476Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8252721Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8253010Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8253431Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8253757Z Max fbarriers/Workgrp: 32 2025-09-07T06:40:33.8254195Z Packet Processor uCode:: 92 2025-09-07T06:40:33.8254556Z SDMA engine uCode:: 9 2025-09-07T06:40:33.8254909Z IOMMU Support:: None 2025-09-07T06:40:33.8255211Z Pool Info: 2025-09-07T06:40:33.8255444Z Pool 1 2025-09-07T06:40:33.8255732Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:33.8256072Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8256406Z Allocatable: TRUE 2025-09-07T06:40:33.8256751Z Alloc Granule: 4KB 2025-09-07T06:40:33.8257124Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8257493Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8257837Z Accessible by all: FALSE 2025-09-07T06:40:33.8258144Z Pool 2 2025-09-07T06:40:33.8258434Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:33.8258770Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8259104Z Allocatable: TRUE 2025-09-07T06:40:33.8259447Z Alloc Granule: 4KB 2025-09-07T06:40:33.8259807Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8260166Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8260525Z Accessible by all: FALSE 2025-09-07T06:40:33.8260830Z Pool 3 2025-09-07T06:40:33.8261103Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:33.8261429Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8261756Z Allocatable: TRUE 2025-09-07T06:40:33.8262248Z Alloc Granule: 4KB 2025-09-07T06:40:33.8262614Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8262970Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8263328Z Accessible by all: FALSE 2025-09-07T06:40:33.8263626Z Pool 4 2025-09-07T06:40:33.8263890Z Segment: GROUP 2025-09-07T06:40:33.8264198Z Size: 64(0x40) KB 2025-09-07T06:40:33.8264516Z Allocatable: FALSE 2025-09-07T06:40:33.8264858Z Alloc Granule: 0KB 2025-09-07T06:40:33.8265211Z Alloc Recommended Granule:0KB 2025-09-07T06:40:33.8265567Z Alloc Alignment: 0KB 2025-09-07T06:40:33.8265917Z Accessible by all: FALSE 2025-09-07T06:40:33.8266222Z ISA Info: 2025-09-07T06:40:33.8266448Z ISA 1 2025-09-07T06:40:33.8266727Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:40:33.8267099Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:40:33.8267460Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:40:33.8267821Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8268188Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8268711Z Fast f16: TRUE 2025-09-07T06:40:33.8269047Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8269374Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8269664Z x 1024(0x400) 2025-09-07T06:40:33.8269961Z y 1024(0x400) 2025-09-07T06:40:33.8270246Z z 1024(0x400) 2025-09-07T06:40:33.8270561Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8270873Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8271131Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8271420Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8271704Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8272030Z FBarrier Max Size: 32 2025-09-07T06:40:33.8272333Z ******* 2025-09-07T06:40:33.8272556Z Agent 6 2025-09-07T06:40:33.8272765Z ******* 2025-09-07T06:40:33.8272994Z Name: gfx90a 2025-09-07T06:40:33.8273314Z Uuid: GPU-04fbe3b4a00a45d1 2025-09-07T06:40:33.8273654Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:33.8273984Z Vendor Name: AMD 2025-09-07T06:40:33.8274308Z Feature: KERNEL_DISPATCH 2025-09-07T06:40:33.8274624Z Profile: BASE_PROFILE 2025-09-07T06:40:33.8274950Z Float Round Mode: NEAR 2025-09-07T06:40:33.8275282Z Max Queue Number: 128(0x80) 2025-09-07T06:40:33.8275610Z Queue Min Size: 64(0x40) 2025-09-07T06:40:33.8275935Z Queue Max Size: 131072(0x20000) 2025-09-07T06:40:33.8276253Z Queue Type: MULTI 2025-09-07T06:40:33.8276549Z Node: 5 2025-09-07T06:40:33.8276981Z Device Type: GPU 2025-09-07T06:40:33.8277271Z Cache Info: 2025-09-07T06:40:33.8277505Z L1: 16(0x10) KB 2025-09-07T06:40:33.8277788Z L2: 8192(0x2000) KB 2025-09-07T06:40:33.8278074Z Chip ID: 29708(0x740c) 2025-09-07T06:40:33.8278380Z ASIC Revision: 1(0x1) 2025-09-07T06:40:33.8278702Z Cacheline Size: 128(0x80) 2025-09-07T06:40:33.8279032Z Max Clock Freq. (MHz): 1700 2025-09-07T06:40:33.8279342Z BDFID: 5120 2025-09-07T06:40:33.8279656Z Internal Node ID: 5 2025-09-07T06:40:33.8279977Z Compute Unit: 104 2025-09-07T06:40:33.8280292Z SIMDs per CU: 4 2025-09-07T06:40:33.8280616Z Shader Engines: 8 2025-09-07T06:40:33.8280949Z Shader Arrs. per Eng.: 1 2025-09-07T06:40:33.8281292Z WatchPts on Addr. Ranges:4 2025-09-07T06:40:33.8281634Z Coherent Host Access: FALSE 2025-09-07T06:40:33.8281934Z Memory Properties: 2025-09-07T06:40:33.8282177Z Features: KERNEL_DISPATCH 2025-09-07T06:40:33.8282483Z Fast F16 Operation: TRUE 2025-09-07T06:40:33.8282815Z Wavefront Size: 64(0x40) 2025-09-07T06:40:33.8283280Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8283586Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8283843Z x 1024(0x400) 2025-09-07T06:40:33.8284113Z y 1024(0x400) 2025-09-07T06:40:33.8284380Z z 1024(0x400) 2025-09-07T06:40:33.8284675Z Max Waves Per CU: 32(0x20) 2025-09-07T06:40:33.8285007Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:40:33.8285333Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8285628Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8285865Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8286141Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8286418Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8286735Z Max fbarriers/Workgrp: 32 2025-09-07T06:40:33.8287094Z Packet Processor uCode:: 92 2025-09-07T06:40:33.8287442Z SDMA engine uCode:: 9 2025-09-07T06:40:33.8287771Z IOMMU Support:: None 2025-09-07T06:40:33.8288063Z Pool Info: 2025-09-07T06:40:33.8288276Z Pool 1 2025-09-07T06:40:33.8288549Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:33.8288875Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8289196Z Allocatable: TRUE 2025-09-07T06:40:33.8289527Z Alloc Granule: 4KB 2025-09-07T06:40:33.8289875Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8290233Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8290575Z Accessible by all: FALSE 2025-09-07T06:40:33.8290870Z Pool 2 2025-09-07T06:40:33.8291140Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:33.8291588Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8291909Z Allocatable: TRUE 2025-09-07T06:40:33.8292238Z Alloc Granule: 4KB 2025-09-07T06:40:33.8292582Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8292928Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8293265Z Accessible by all: FALSE 2025-09-07T06:40:33.8293556Z Pool 3 2025-09-07T06:40:33.8293897Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:33.8294219Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8294533Z Allocatable: TRUE 2025-09-07T06:40:33.8294861Z Alloc Granule: 4KB 2025-09-07T06:40:33.8295211Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8295561Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8295899Z Accessible by all: FALSE 2025-09-07T06:40:33.8296194Z Pool 4 2025-09-07T06:40:33.8296446Z Segment: GROUP 2025-09-07T06:40:33.8296745Z Size: 64(0x40) KB 2025-09-07T06:40:33.8297054Z Allocatable: FALSE 2025-09-07T06:40:33.8297388Z Alloc Granule: 0KB 2025-09-07T06:40:33.8297900Z Alloc Recommended Granule:0KB 2025-09-07T06:40:33.8298254Z Alloc Alignment: 0KB 2025-09-07T06:40:33.8298596Z Accessible by all: FALSE 2025-09-07T06:40:33.8298889Z ISA Info: 2025-09-07T06:40:33.8299112Z ISA 1 2025-09-07T06:40:33.8299386Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:40:33.8299752Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:40:33.8300101Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:40:33.8300452Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8300795Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8301121Z Fast f16: TRUE 2025-09-07T06:40:33.8301447Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8301764Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8302037Z x 1024(0x400) 2025-09-07T06:40:33.8302310Z y 1024(0x400) 2025-09-07T06:40:33.8302582Z z 1024(0x400) 2025-09-07T06:40:33.8302877Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8303183Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8303435Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8303709Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8303982Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8304291Z FBarrier Max Size: 32 2025-09-07T06:40:33.8304585Z ******* 2025-09-07T06:40:33.8304785Z Agent 7 2025-09-07T06:40:33.8304984Z ******* 2025-09-07T06:40:33.8305215Z Name: gfx90a 2025-09-07T06:40:33.8305524Z Uuid: GPU-d5bef60d28576f7f 2025-09-07T06:40:33.8306012Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:33.8306353Z Vendor Name: AMD 2025-09-07T06:40:33.8306670Z Feature: KERNEL_DISPATCH 2025-09-07T06:40:33.8306991Z Profile: BASE_PROFILE 2025-09-07T06:40:33.8307314Z Float Round Mode: NEAR 2025-09-07T06:40:33.8307639Z Max Queue Number: 128(0x80) 2025-09-07T06:40:33.8307964Z Queue Min Size: 64(0x40) 2025-09-07T06:40:33.8308284Z Queue Max Size: 131072(0x20000) 2025-09-07T06:40:33.8308598Z Queue Type: MULTI 2025-09-07T06:40:33.8308897Z Node: 6 2025-09-07T06:40:33.8309223Z Device Type: GPU 2025-09-07T06:40:33.8309516Z Cache Info: 2025-09-07T06:40:33.8309762Z L1: 16(0x10) KB 2025-09-07T06:40:33.8310045Z L2: 8192(0x2000) KB 2025-09-07T06:40:33.8310335Z Chip ID: 29708(0x740c) 2025-09-07T06:40:33.8310653Z ASIC Revision: 1(0x1) 2025-09-07T06:40:33.8310982Z Cacheline Size: 128(0x80) 2025-09-07T06:40:33.8311308Z Max Clock Freq. (MHz): 1700 2025-09-07T06:40:33.8311622Z BDFID: 44544 2025-09-07T06:40:33.8312074Z Internal Node ID: 6 2025-09-07T06:40:33.8312399Z Compute Unit: 104 2025-09-07T06:40:33.8312717Z SIMDs per CU: 4 2025-09-07T06:40:33.8313036Z Shader Engines: 8 2025-09-07T06:40:33.8313373Z Shader Arrs. per Eng.: 1 2025-09-07T06:40:33.8313714Z WatchPts on Addr. Ranges:4 2025-09-07T06:40:33.8314059Z Coherent Host Access: FALSE 2025-09-07T06:40:33.8314366Z Memory Properties: 2025-09-07T06:40:33.8314616Z Features: KERNEL_DISPATCH 2025-09-07T06:40:33.8314921Z Fast F16 Operation: TRUE 2025-09-07T06:40:33.8315251Z Wavefront Size: 64(0x40) 2025-09-07T06:40:33.8315589Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8315899Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8316162Z x 1024(0x400) 2025-09-07T06:40:33.8316434Z y 1024(0x400) 2025-09-07T06:40:33.8316702Z z 1024(0x400) 2025-09-07T06:40:33.8317003Z Max Waves Per CU: 32(0x20) 2025-09-07T06:40:33.8317334Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:40:33.8317665Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8317957Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8318199Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8318470Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8318739Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8319078Z Max fbarriers/Workgrp: 32 2025-09-07T06:40:33.8319436Z Packet Processor uCode:: 92 2025-09-07T06:40:33.8319788Z SDMA engine uCode:: 9 2025-09-07T06:40:33.8320120Z IOMMU Support:: None 2025-09-07T06:40:33.8320550Z Pool Info: 2025-09-07T06:40:33.8320772Z Pool 1 2025-09-07T06:40:33.8321047Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:33.8321369Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8321686Z Allocatable: TRUE 2025-09-07T06:40:33.8322014Z Alloc Granule: 4KB 2025-09-07T06:40:33.8322358Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8322709Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8323050Z Accessible by all: FALSE 2025-09-07T06:40:33.8323342Z Pool 2 2025-09-07T06:40:33.8323608Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:33.8323928Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8324239Z Allocatable: TRUE 2025-09-07T06:40:33.8324566Z Alloc Granule: 4KB 2025-09-07T06:40:33.8324905Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8325248Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8325587Z Accessible by all: FALSE 2025-09-07T06:40:33.8325877Z Pool 3 2025-09-07T06:40:33.8326139Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:33.8326606Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8326915Z Allocatable: TRUE 2025-09-07T06:40:33.8327242Z Alloc Granule: 4KB 2025-09-07T06:40:33.8327601Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8327962Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8328299Z Accessible by all: FALSE 2025-09-07T06:40:33.8328594Z Pool 4 2025-09-07T06:40:33.8328856Z Segment: GROUP 2025-09-07T06:40:33.8329163Z Size: 64(0x40) KB 2025-09-07T06:40:33.8329483Z Allocatable: FALSE 2025-09-07T06:40:33.8329815Z Alloc Granule: 0KB 2025-09-07T06:40:33.8330166Z Alloc Recommended Granule:0KB 2025-09-07T06:40:33.8330506Z Alloc Alignment: 0KB 2025-09-07T06:40:33.8330840Z Accessible by all: FALSE 2025-09-07T06:40:33.8331134Z ISA Info: 2025-09-07T06:40:33.8331347Z ISA 1 2025-09-07T06:40:33.8331622Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:40:33.8331985Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:40:33.8332335Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:40:33.8332678Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8333028Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8333355Z Fast f16: TRUE 2025-09-07T06:40:33.8333682Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8334079Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8334358Z x 1024(0x400) 2025-09-07T06:40:33.8334633Z y 1024(0x400) 2025-09-07T06:40:33.8334902Z z 1024(0x400) 2025-09-07T06:40:33.8335360Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8335669Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8335922Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8336201Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8336474Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8336784Z FBarrier Max Size: 32 2025-09-07T06:40:33.8337073Z ******* 2025-09-07T06:40:33.8337283Z Agent 8 2025-09-07T06:40:33.8337483Z ******* 2025-09-07T06:40:33.8337720Z Name: gfx90a 2025-09-07T06:40:33.8338031Z Uuid: GPU-8a1a325e7a817ddd 2025-09-07T06:40:33.8338363Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:33.8338701Z Vendor Name: AMD 2025-09-07T06:40:33.8339033Z Feature: KERNEL_DISPATCH 2025-09-07T06:40:33.8339358Z Profile: BASE_PROFILE 2025-09-07T06:40:33.8339701Z Float Round Mode: NEAR 2025-09-07T06:40:33.8340039Z Max Queue Number: 128(0x80) 2025-09-07T06:40:33.8340370Z Queue Min Size: 64(0x40) 2025-09-07T06:40:33.8340691Z Queue Max Size: 131072(0x20000) 2025-09-07T06:40:33.8341170Z Queue Type: MULTI 2025-09-07T06:40:33.8341473Z Node: 7 2025-09-07T06:40:33.8341778Z Device Type: GPU 2025-09-07T06:40:33.8342070Z Cache Info: 2025-09-07T06:40:33.8342319Z L1: 16(0x10) KB 2025-09-07T06:40:33.8342605Z L2: 8192(0x2000) KB 2025-09-07T06:40:33.8342893Z Chip ID: 29708(0x740c) 2025-09-07T06:40:33.8343208Z ASIC Revision: 1(0x1) 2025-09-07T06:40:33.8343539Z Cacheline Size: 128(0x80) 2025-09-07T06:40:33.8343866Z Max Clock Freq. (MHz): 1700 2025-09-07T06:40:33.8344176Z BDFID: 45824 2025-09-07T06:40:33.8344488Z Internal Node ID: 7 2025-09-07T06:40:33.8344819Z Compute Unit: 104 2025-09-07T06:40:33.8345138Z SIMDs per CU: 4 2025-09-07T06:40:33.8345462Z Shader Engines: 8 2025-09-07T06:40:33.8345794Z Shader Arrs. per Eng.: 1 2025-09-07T06:40:33.8346136Z WatchPts on Addr. Ranges:4 2025-09-07T06:40:33.8346481Z Coherent Host Access: FALSE 2025-09-07T06:40:33.8346783Z Memory Properties: 2025-09-07T06:40:33.8347019Z Features: KERNEL_DISPATCH 2025-09-07T06:40:33.8347328Z Fast F16 Operation: TRUE 2025-09-07T06:40:33.8347660Z Wavefront Size: 64(0x40) 2025-09-07T06:40:33.8347993Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8348305Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8348566Z x 1024(0x400) 2025-09-07T06:40:33.8348846Z y 1024(0x400) 2025-09-07T06:40:33.8349111Z z 1024(0x400) 2025-09-07T06:40:33.8349540Z Max Waves Per CU: 32(0x20) 2025-09-07T06:40:33.8349880Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:40:33.8350210Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8350504Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8350744Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8351013Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8351282Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8351598Z Max fbarriers/Workgrp: 32 2025-09-07T06:40:33.8351962Z Packet Processor uCode:: 92 2025-09-07T06:40:33.8352314Z SDMA engine uCode:: 9 2025-09-07T06:40:33.8352646Z IOMMU Support:: None 2025-09-07T06:40:33.8352938Z Pool Info: 2025-09-07T06:40:33.8353159Z Pool 1 2025-09-07T06:40:33.8353438Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:33.8353773Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8354100Z Allocatable: TRUE 2025-09-07T06:40:33.8354435Z Alloc Granule: 4KB 2025-09-07T06:40:33.8354785Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8355144Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8355481Z Accessible by all: FALSE 2025-09-07T06:40:33.8355913Z Pool 2 2025-09-07T06:40:33.8356185Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:33.8356505Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8356817Z Allocatable: TRUE 2025-09-07T06:40:33.8357147Z Alloc Granule: 4KB 2025-09-07T06:40:33.8357490Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8357840Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8358178Z Accessible by all: FALSE 2025-09-07T06:40:33.8358469Z Pool 3 2025-09-07T06:40:33.8358729Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:33.8359038Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8359355Z Allocatable: TRUE 2025-09-07T06:40:33.8359682Z Alloc Granule: 4KB 2025-09-07T06:40:33.8360025Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8360370Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8360709Z Accessible by all: FALSE 2025-09-07T06:40:33.8361000Z Pool 4 2025-09-07T06:40:33.8361253Z Segment: GROUP 2025-09-07T06:40:33.8361558Z Size: 64(0x40) KB 2025-09-07T06:40:33.8361868Z Allocatable: FALSE 2025-09-07T06:40:33.8362200Z Alloc Granule: 0KB 2025-09-07T06:40:33.8362549Z Alloc Recommended Granule:0KB 2025-09-07T06:40:33.8362900Z Alloc Alignment: 0KB 2025-09-07T06:40:33.8363236Z Accessible by all: FALSE 2025-09-07T06:40:33.8363530Z ISA Info: 2025-09-07T06:40:33.8363742Z ISA 1 2025-09-07T06:40:33.8364145Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:40:33.8364517Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:40:33.8364859Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:40:33.8365201Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8365557Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8365884Z Fast f16: TRUE 2025-09-07T06:40:33.8366209Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8366525Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8366807Z x 1024(0x400) 2025-09-07T06:40:33.8367089Z y 1024(0x400) 2025-09-07T06:40:33.8367359Z z 1024(0x400) 2025-09-07T06:40:33.8367666Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8367961Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8368213Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8368485Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8368761Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8369070Z FBarrier Max Size: 32 2025-09-07T06:40:33.8369358Z ******* 2025-09-07T06:40:33.8369567Z Agent 9 2025-09-07T06:40:33.8369904Z ******* 2025-09-07T06:40:33.8370136Z Name: gfx90a 2025-09-07T06:40:33.8370444Z Uuid: GPU-26deaaad0d24bc07 2025-09-07T06:40:33.8370779Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:33.8371119Z Vendor Name: AMD 2025-09-07T06:40:33.8371444Z Feature: KERNEL_DISPATCH 2025-09-07T06:40:33.8371763Z Profile: BASE_PROFILE 2025-09-07T06:40:33.8372084Z Float Round Mode: NEAR 2025-09-07T06:40:33.8372415Z Max Queue Number: 128(0x80) 2025-09-07T06:40:33.8372736Z Queue Min Size: 64(0x40) 2025-09-07T06:40:33.8373054Z Queue Max Size: 131072(0x20000) 2025-09-07T06:40:33.8373368Z Queue Type: MULTI 2025-09-07T06:40:33.8373674Z Node: 8 2025-09-07T06:40:33.8374049Z Device Type: GPU 2025-09-07T06:40:33.8374332Z Cache Info: 2025-09-07T06:40:33.8374565Z L1: 16(0x10) KB 2025-09-07T06:40:33.8374845Z L2: 8192(0x2000) KB 2025-09-07T06:40:33.8375129Z Chip ID: 29708(0x740c) 2025-09-07T06:40:33.8375440Z ASIC Revision: 1(0x1) 2025-09-07T06:40:33.8375762Z Cacheline Size: 128(0x80) 2025-09-07T06:40:33.8376089Z Max Clock Freq. (MHz): 1700 2025-09-07T06:40:33.8376396Z BDFID: 36352 2025-09-07T06:40:33.8376709Z Internal Node ID: 8 2025-09-07T06:40:33.8377034Z Compute Unit: 104 2025-09-07T06:40:33.8377356Z SIMDs per CU: 4 2025-09-07T06:40:33.8377678Z Shader Engines: 8 2025-09-07T06:40:33.8378011Z Shader Arrs. per Eng.: 1 2025-09-07T06:40:33.8378534Z WatchPts on Addr. Ranges:4 2025-09-07T06:40:33.8378885Z Coherent Host Access: FALSE 2025-09-07T06:40:33.8379188Z Memory Properties: 2025-09-07T06:40:33.8379435Z Features: KERNEL_DISPATCH 2025-09-07T06:40:33.8379740Z Fast F16 Operation: TRUE 2025-09-07T06:40:33.8380074Z Wavefront Size: 64(0x40) 2025-09-07T06:40:33.8380407Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8380708Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8380963Z x 1024(0x400) 2025-09-07T06:40:33.8381234Z y 1024(0x400) 2025-09-07T06:40:33.8381495Z z 1024(0x400) 2025-09-07T06:40:33.8381794Z Max Waves Per CU: 32(0x20) 2025-09-07T06:40:33.8382139Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:40:33.8387970Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8388310Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8388590Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8388890Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8389179Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8389501Z Max fbarriers/Workgrp: 32 2025-09-07T06:40:33.8389882Z Packet Processor uCode:: 92 2025-09-07T06:40:33.8390470Z SDMA engine uCode:: 9 2025-09-07T06:40:33.8390809Z IOMMU Support:: None 2025-09-07T06:40:33.8391112Z Pool Info: 2025-09-07T06:40:33.8391336Z Pool 1 2025-09-07T06:40:33.8391627Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:33.8391955Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8392277Z Allocatable: TRUE 2025-09-07T06:40:33.8392609Z Alloc Granule: 4KB 2025-09-07T06:40:33.8392960Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8393321Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8393660Z Accessible by all: FALSE 2025-09-07T06:40:33.8393956Z Pool 2 2025-09-07T06:40:33.8394229Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:33.8394561Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8394885Z Allocatable: TRUE 2025-09-07T06:40:33.8395218Z Alloc Granule: 4KB 2025-09-07T06:40:33.8395572Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8395933Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8396273Z Accessible by all: FALSE 2025-09-07T06:40:33.8396567Z Pool 3 2025-09-07T06:40:33.8396835Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:33.8397152Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8397461Z Allocatable: TRUE 2025-09-07T06:40:33.8397798Z Alloc Granule: 4KB 2025-09-07T06:40:33.8398142Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8398492Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8398976Z Accessible by all: FALSE 2025-09-07T06:40:33.8399277Z Pool 4 2025-09-07T06:40:33.8399528Z Segment: GROUP 2025-09-07T06:40:33.8399828Z Size: 64(0x40) KB 2025-09-07T06:40:33.8400139Z Allocatable: FALSE 2025-09-07T06:40:33.8400463Z Alloc Granule: 0KB 2025-09-07T06:40:33.8400809Z Alloc Recommended Granule:0KB 2025-09-07T06:40:33.8401157Z Alloc Alignment: 0KB 2025-09-07T06:40:33.8401496Z Accessible by all: FALSE 2025-09-07T06:40:33.8401791Z ISA Info: 2025-09-07T06:40:33.8402006Z ISA 1 2025-09-07T06:40:33.8402283Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:40:33.8402654Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:40:33.8403000Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:40:33.8403342Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8403689Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8404012Z Fast f16: TRUE 2025-09-07T06:40:33.8404335Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8404648Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8404929Z x 1024(0x400) 2025-09-07T06:40:33.8405349Z y 1024(0x400) 2025-09-07T06:40:33.8405612Z z 1024(0x400) 2025-09-07T06:40:33.8405917Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8406220Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8406472Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8406749Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8407026Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8407339Z FBarrier Max Size: 32 2025-09-07T06:40:33.8407628Z ******* 2025-09-07T06:40:33.8407835Z Agent 10 2025-09-07T06:40:33.8408031Z ******* 2025-09-07T06:40:33.8408267Z Name: gfx90a 2025-09-07T06:40:33.8408581Z Uuid: GPU-750f6a6a723531b8 2025-09-07T06:40:33.8408919Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:33.8409254Z Vendor Name: AMD 2025-09-07T06:40:33.8409567Z Feature: KERNEL_DISPATCH 2025-09-07T06:40:33.8409889Z Profile: BASE_PROFILE 2025-09-07T06:40:33.8410215Z Float Round Mode: NEAR 2025-09-07T06:40:33.8410547Z Max Queue Number: 128(0x80) 2025-09-07T06:40:33.8410873Z Queue Min Size: 64(0x40) 2025-09-07T06:40:33.8411188Z Queue Max Size: 131072(0x20000) 2025-09-07T06:40:33.8411512Z Queue Type: MULTI 2025-09-07T06:40:33.8411826Z Node: 9 2025-09-07T06:40:33.8412145Z Device Type: GPU 2025-09-07T06:40:33.8412436Z Cache Info: 2025-09-07T06:40:33.8412688Z L1: 16(0x10) KB 2025-09-07T06:40:33.8412974Z L2: 8192(0x2000) KB 2025-09-07T06:40:33.8413392Z Chip ID: 29708(0x740c) 2025-09-07T06:40:33.8413716Z ASIC Revision: 1(0x1) 2025-09-07T06:40:33.8414124Z Cacheline Size: 128(0x80) 2025-09-07T06:40:33.8414453Z Max Clock Freq. (MHz): 1700 2025-09-07T06:40:33.8414761Z BDFID: 37632 2025-09-07T06:40:33.8415073Z Internal Node ID: 9 2025-09-07T06:40:33.8415394Z Compute Unit: 104 2025-09-07T06:40:33.8415715Z SIMDs per CU: 4 2025-09-07T06:40:33.8416036Z Shader Engines: 8 2025-09-07T06:40:33.8416372Z Shader Arrs. per Eng.: 1 2025-09-07T06:40:33.8416715Z WatchPts on Addr. Ranges:4 2025-09-07T06:40:33.8417062Z Coherent Host Access: FALSE 2025-09-07T06:40:33.8417373Z Memory Properties: 2025-09-07T06:40:33.8417615Z Features: KERNEL_DISPATCH 2025-09-07T06:40:33.8417929Z Fast F16 Operation: TRUE 2025-09-07T06:40:33.8418263Z Wavefront Size: 64(0x40) 2025-09-07T06:40:33.8418598Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8418912Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8419170Z x 1024(0x400) 2025-09-07T06:40:33.8419612Z y 1024(0x400) 2025-09-07T06:40:33.8419882Z z 1024(0x400) 2025-09-07T06:40:33.8420181Z Max Waves Per CU: 32(0x20) 2025-09-07T06:40:33.8420518Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:40:33.8420854Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8421150Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8421388Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8421662Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8421937Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8422245Z Max fbarriers/Workgrp: 32 2025-09-07T06:40:33.8422609Z Packet Processor uCode:: 92 2025-09-07T06:40:33.8422955Z SDMA engine uCode:: 9 2025-09-07T06:40:33.8423292Z IOMMU Support:: None 2025-09-07T06:40:33.8423579Z Pool Info: 2025-09-07T06:40:33.8423800Z Pool 1 2025-09-07T06:40:33.8424076Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:33.8424404Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8424728Z Allocatable: TRUE 2025-09-07T06:40:33.8425056Z Alloc Granule: 4KB 2025-09-07T06:40:33.8425398Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8425751Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8426088Z Accessible by all: FALSE 2025-09-07T06:40:33.8426375Z Pool 2 2025-09-07T06:40:33.8426639Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:33.8426961Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8427269Z Allocatable: TRUE 2025-09-07T06:40:33.8427593Z Alloc Granule: 4KB 2025-09-07T06:40:33.8428088Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8428437Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8428771Z Accessible by all: FALSE 2025-09-07T06:40:33.8429067Z Pool 3 2025-09-07T06:40:33.8429330Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:33.8429636Z Size: 67092480(0x3ffc000) KB 2025-09-07T06:40:33.8429943Z Allocatable: TRUE 2025-09-07T06:40:33.8430265Z Alloc Granule: 4KB 2025-09-07T06:40:33.8430606Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:33.8430955Z Alloc Alignment: 4KB 2025-09-07T06:40:33.8431286Z Accessible by all: FALSE 2025-09-07T06:40:33.8431574Z Pool 4 2025-09-07T06:40:33.8431824Z Segment: GROUP 2025-09-07T06:40:33.8432127Z Size: 64(0x40) KB 2025-09-07T06:40:33.8432435Z Allocatable: FALSE 2025-09-07T06:40:33.8432764Z Alloc Granule: 0KB 2025-09-07T06:40:33.8433110Z Alloc Recommended Granule:0KB 2025-09-07T06:40:33.8433458Z Alloc Alignment: 0KB 2025-09-07T06:40:33.8433794Z Accessible by all: FALSE 2025-09-07T06:40:33.8434221Z ISA Info: 2025-09-07T06:40:33.8434430Z ISA 1 2025-09-07T06:40:33.8434721Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-09-07T06:40:33.8435083Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:40:33.8435428Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:40:33.8435769Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8436113Z Default Rounding Mode: NEAR 2025-09-07T06:40:33.8436438Z Fast f16: TRUE 2025-09-07T06:40:33.8436756Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:33.8437067Z Workgroup Max Size per Dimension: 2025-09-07T06:40:33.8437342Z x 1024(0x400) 2025-09-07T06:40:33.8437624Z y 1024(0x400) 2025-09-07T06:40:33.8437892Z z 1024(0x400) 2025-09-07T06:40:33.8438192Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:33.8438486Z Grid Max Size per Dimension: 2025-09-07T06:40:33.8438740Z x 4294967295(0xffffffff) 2025-09-07T06:40:33.8439019Z y 4294967295(0xffffffff) 2025-09-07T06:40:33.8439295Z z 4294967295(0xffffffff) 2025-09-07T06:40:33.8439603Z FBarrier Max Size: 32 2025-09-07T06:40:33.8439894Z *** Done *** 2025-09-07T06:40:33.8440101Z + rocminfo 2025-09-07T06:40:33.8440296Z + grep -E 'Name:.*\sgfx|Marketing' 2025-09-07T06:40:34.0619139Z Marketing Name: AMD EPYC 7713 64-Core Processor 2025-09-07T06:40:34.0619879Z Marketing Name: AMD EPYC 7713 64-Core Processor 2025-09-07T06:40:34.0620521Z Name: gfx90a 2025-09-07T06:40:34.0621122Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:34.0621707Z Name: gfx90a 2025-09-07T06:40:34.0622669Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:34.0623385Z Name: gfx90a 2025-09-07T06:40:34.0624068Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:34.0624753Z Name: gfx90a 2025-09-07T06:40:34.0625409Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:34.0626064Z Name: gfx90a 2025-09-07T06:40:34.0626718Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:34.0627230Z Name: gfx90a 2025-09-07T06:40:34.0627619Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:34.0627939Z Name: gfx90a 2025-09-07T06:40:34.0628268Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:34.0628594Z Name: gfx90a 2025-09-07T06:40:34.0628927Z Marketing Name: AMD Instinct MI250X/MI250 2025-09-07T06:40:34.0794161Z + MAYBE_ROCM=rocm/ 2025-09-07T06:40:34.0794937Z + [[ linux-jammy-rocm-py3.10 == *xpu* ]] 2025-09-07T06:40:34.0795442Z + [[ linux-jammy-rocm-py3.10 != *-bazel-* ]] 2025-09-07T06:40:34.0795913Z + pip_install ninja==1.10.2 2025-09-07T06:40:34.0796464Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-09-07T06:40:34.0797164Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-09-07T06:40:41.6561931Z Collecting ninja==1.10.2 2025-09-07T06:40:41.7150798Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-09-07T06:40:41.7297750Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-09-07T06:40:42.0948368Z Installing collected packages: ninja 2025-09-07T06:40:42.0949347Z Attempting uninstall: ninja 2025-09-07T06:40:42.0955690Z Found existing installation: ninja 1.11.1.3 2025-09-07T06:40:42.0977944Z Uninstalling ninja-1.11.1.3: 2025-09-07T06:40:42.1067946Z Successfully uninstalled ninja-1.11.1.3 2025-09-07T06:40:42.1396231Z Successfully installed ninja-1.10.2 2025-09-07T06:40:42.1859109Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T06:40:42.1861885Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T06:40:42.1863508Z + [[ linux-jammy-rocm-py3.10 == *aarch64* ]] 2025-09-07T06:40:42.1864027Z + [[ linux-jammy-rocm-py3.10 == *asan* ]] 2025-09-07T06:40:42.1864540Z + [[ linux-jammy-rocm-py3.10 == *-debug* ]] 2025-09-07T06:40:42.1865073Z + [[ linux-jammy-rocm-py3.10 != *-bazel-* ]] 2025-09-07T06:40:42.1865794Z + echo 'We are not in debug mode: linux-jammy-rocm-py3.10. Expect the assertion to pass' 2025-09-07T06:40:42.1866729Z We are not in debug mode: linux-jammy-rocm-py3.10. Expect the assertion to pass 2025-09-07T06:40:42.1867503Z + cd test 2025-09-07T06:40:42.1868105Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-09-07T06:40:43.7095278Z + [[ slow == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-09-07T06:40:43.7095979Z + [[ slow == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-09-07T06:40:43.7096609Z + [[ slow == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-09-07T06:40:43.7100455Z + DYNAMO_BENCHMARK_FLAGS=() 2025-09-07T06:40:43.7100947Z + [[ slow == *pr_time_benchmarks* ]] 2025-09-07T06:40:43.7101405Z + [[ slow == *dynamo_eager* ]] 2025-09-07T06:40:43.7101805Z + [[ slow == *aot_eager* ]] 2025-09-07T06:40:43.7102172Z + [[ slow == *aot_inductor* ]] 2025-09-07T06:40:43.7102594Z + [[ slow == *max_autotune_inductor* ]] 2025-09-07T06:40:43.7103599Z + [[ slow == *inductor* ]] 2025-09-07T06:40:43.7104035Z + [[ slow == *dynamic* ]] 2025-09-07T06:40:43.7104441Z + [[ slow == *cpu* ]] 2025-09-07T06:40:43.7104861Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-09-07T06:40:43.7120715Z + [[ linux-jammy-rocm-py3.10 == *libtorch* ]] 2025-09-07T06:40:43.7121048Z + [[ linux-jammy-rocm-py3.10 == *-bazel-* ]] 2025-09-07T06:40:43.7126271Z + cd test 2025-09-07T06:40:43.7127706Z + python -c 'import torch; print(torch.__config__.show())' 2025-09-07T06:40:45.2032773Z PyTorch built with: 2025-09-07T06:40:45.2033066Z - GCC 11.4 2025-09-07T06:40:45.2033318Z - C++ Version: 201703 2025-09-07T06:40:45.2033974Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-09-07T06:40:45.2035021Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-09-07T06:40:45.2035657Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-09-07T06:40:45.2036169Z - LAPACK is enabled (usually provided by MKL) 2025-09-07T06:40:45.2036667Z - NNPACK is enabled 2025-09-07T06:40:45.2037048Z - CPU capability usage: AVX2 2025-09-07T06:40:45.2037464Z - HIP Runtime 6.4.43484 2025-09-07T06:40:45.2037834Z - MIOpen 3.4.0 2025-09-07T06:40:45.2038168Z - Magma 2.7.2 2025-09-07T06:40:45.2044564Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=93fb23d6fae7c4e82c4239a1033e522088742634, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_FBGEMM_GENAI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.9.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=ON, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-09-07T06:40:45.2048904Z 2025-09-07T06:40:45.4865252Z + cd test 2025-09-07T06:40:45.4866046Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-09-07T06:40:46.7453818Z ATen/Parallel: 2025-09-07T06:40:46.7454292Z at::get_num_threads() : 128 2025-09-07T06:40:46.7454633Z at::get_num_interop_threads() : 128 2025-09-07T06:40:46.7454986Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-09-07T06:40:46.7455309Z omp_get_max_threads() : 128 2025-09-07T06:40:46.7455914Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-09-07T06:40:46.7456547Z mkl_get_max_threads() : 128 2025-09-07T06:40:46.7456960Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-09-07T06:40:46.7457453Z std::thread::hardware_concurrency() : 128 2025-09-07T06:40:46.7457796Z Environment variables: 2025-09-07T06:40:46.7458068Z OMP_NUM_THREADS : [not set] 2025-09-07T06:40:46.7458344Z MKL_NUM_THREADS : [not set] 2025-09-07T06:40:46.7458633Z ATen parallel backend: OpenMP 2025-09-07T06:40:46.7458841Z 2025-09-07T06:40:47.0038554Z + [[ slow == *numpy_2* ]] 2025-09-07T06:40:47.0039100Z + [[ linux-jammy-rocm-py3.10 == *aarch64* ]] 2025-09-07T06:40:47.0039646Z + [[ slow == *backward* ]] 2025-09-07T06:40:47.0040043Z + [[ slow == *xla* ]] 2025-09-07T06:40:47.0040391Z + [[ slow == *vllm* ]] 2025-09-07T06:40:47.0040759Z + [[ slow == *executorch* ]] 2025-09-07T06:40:47.0041173Z + [[ slow == \j\i\t\_\l\e\g\a\c\y ]] 2025-09-07T06:40:47.0041661Z + [[ linux-jammy-rocm-py3.10 == *libtorch* ]] 2025-09-07T06:40:47.0042142Z + [[ slow == distributed ]] 2025-09-07T06:40:47.0043283Z + [[ slow == *operator_benchmark* ]] 2025-09-07T06:40:47.0043795Z + [[ slow == *inductor_distributed* ]] 2025-09-07T06:40:47.0044268Z + [[ slow == *inductor-halide* ]] 2025-09-07T06:40:47.0044731Z + [[ slow == *inductor-triton-cpu* ]] 2025-09-07T06:40:47.0045208Z + [[ slow == *inductor-micro-benchmark* ]] 2025-09-07T06:40:47.0045672Z + [[ slow == *huggingface* ]] 2025-09-07T06:40:47.0046052Z + [[ slow == *timm* ]] 2025-09-07T06:40:47.0046397Z + [[ slow == cachebench ]] 2025-09-07T06:40:47.0046774Z + [[ slow == verify_cachebench ]] 2025-09-07T06:40:47.0047168Z + [[ slow == *torchbench* ]] 2025-09-07T06:40:47.0047581Z + [[ slow == *inductor_cpp_wrapper* ]] 2025-09-07T06:40:47.0048000Z + [[ slow == *inductor* ]] 2025-09-07T06:40:47.0048360Z + [[ slow == *einops* ]] 2025-09-07T06:40:47.0048732Z + [[ slow == *dynamo_wrapped* ]] 2025-09-07T06:40:47.0049111Z + [[ linux-jammy-rocm-py3.10 == *rocm* ]] 2025-09-07T06:40:47.0049374Z + [[ -n '' ]] 2025-09-07T06:40:47.0049555Z + [[ 1 == 1 ]] 2025-09-07T06:40:47.0049746Z + [[ 2 -gt 1 ]] 2025-09-07T06:40:47.0049963Z + test_lazy_tensor_meta_reference_disabled 2025-09-07T06:40:47.0050302Z + export TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE=1 2025-09-07T06:40:47.0050667Z + TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE=1 2025-09-07T06:40:47.0051032Z + echo 'Testing lazy tensor operations without meta reference' 2025-09-07T06:40:47.0051405Z Testing lazy tensor operations without meta reference 2025-09-07T06:40:47.0051806Z + python test/run_test.py --include lazy/test_ts_opinfo.py --verbose 2025-09-07T06:40:49.6332656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:40:49.6334999Z import pkg_resources 2025-09-07T06:40:51.9689458Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/pytorch/test/.pytorch-disabled-tests.json 2025-09-07T06:40:52.1685607Z Ignoring disabled issues: [''] 2025-09-07T06:40:52.1826004Z Found test times from artifacts 2025-09-07T06:40:52.2376965Z Found test times from artifacts 2025-09-07T06:40:52.2391950Z Running all tests 2025-09-07T06:40:52.2395064Z Running parallel tests on 8 processes 2025-09-07T06:40:52.2395420Z Name: tests to run (est. time: 0.0min) 2025-09-07T06:40:52.2395699Z Serial tests (0): 2025-09-07T06:40:52.2395909Z Parallel tests (1): 2025-09-07T06:40:52.2396134Z lazy/test_ts_opinfo 1/1 2025-09-07T06:40:52.2396393Z Name: excluded (est. time: 0.0min) 2025-09-07T06:40:52.2396625Z Serial tests (0): 2025-09-07T06:40:52.2396823Z Parallel tests (0): 2025-09-07T06:40:52.2397927Z Running lazy/test_ts_opinfo 1/1 ... [2025-09-07 06:40:52.239685] 2025-09-07T06:40:52.2398272Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:40:52.2401728Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_ts_opinfo.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:40:52.240009] 2025-09-07T06:40:56.7609418Z 2025-09-07T06:40:56.7610813Z lazy/test_ts_opinfo 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_ts_opinfo_1.1_e596c23b3d385351_.log 2025-09-07T06:40:56.7611980Z Running 0 items in this shard: 2025-09-07T06:40:56.7612298Z 2025-09-07T06:40:56.7612717Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T06:40:56.7613606Z Uploading artifacts took 0.00 seconds 2025-09-07T06:40:59.9350491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:40:59.9352120Z import pkg_resources 2025-09-07T06:41:00.0014530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:41:00.0016557Z import pkg_resources 2025-09-07T06:41:00.0023318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:41:00.0025302Z import pkg_resources 2025-09-07T06:41:00.0223353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:41:00.0225436Z import pkg_resources 2025-09-07T06:41:00.0322728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:41:00.0325267Z import pkg_resources 2025-09-07T06:41:00.0329176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:41:00.0330528Z import pkg_resources 2025-09-07T06:41:00.0439013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:41:00.0440959Z import pkg_resources 2025-09-07T06:41:00.0448248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:41:00.0449621Z import pkg_resources 2025-09-07T06:41:00.6637873Z Running lazy/test_ts_opinfo 1/1 ... [2025-09-07 06:41:00.663485] 2025-09-07T06:41:00.6638637Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:41:00.6641067Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_ts_opinfo.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:41:00.663846] 2025-09-07T06:41:05.2851543Z 2025-09-07T06:41:05.2852656Z lazy/test_ts_opinfo 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_ts_opinfo_1.1_f35ba29b777452bc_.log 2025-09-07T06:41:05.2856328Z Running 5 items in this shard: test/lazy/test_ts_opinfo.py::TestLazyTensor::testConvolutionBackward, test/lazy/test_ts_opinfo.py::TestLazyTensor::test_tensor_ctr, test/lazy/test_ts_opinfo.py::TestLazyTensor::test_view_mark_step_preserved, test/lazy/test_ts_opinfo.py::TestLazyDynamicOps::test_adaptiveavgpool3d_dynamic, test/lazy/test_ts_opinfo.py::TestLazyDynamicOps::test_nonzero_dynamic 2025-09-07T06:41:05.2858668Z 2025-09-07T06:41:06.0349575Z Running test batch 'tests to run' cost 13.8 seconds 2025-09-07T06:41:06.5966414Z 2025-09-07T06:41:06.5966639Z real 0m19.592s 2025-09-07T06:41:06.5967020Z user 0m45.131s 2025-09-07T06:41:06.5967353Z sys 1m7.994s 2025-09-07T06:41:06.5967859Z + export -n TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE 2025-09-07T06:41:06.5968431Z + test_without_numpy 2025-09-07T06:41:06.5976172Z ++ dirname .ci/pytorch/test.sh 2025-09-07T06:41:06.5992859Z + pushd .ci/pytorch 2025-09-07T06:41:06.5993279Z ~/pytorch/.ci/pytorch ~/pytorch 2025-09-07T06:41:06.5994296Z + python -c 'import sys;sys.path.insert(0, '\''fake_numpy'\'');from unittest import TestCase;import torch;x=torch.randn(3,3);TestCase().assertRaises(RuntimeError, lambda: x.numpy())' 2025-09-07T06:41:07.4002525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: Sorry PyTorch, but our NumPy is in the other folder (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:84.) 2025-09-07T06:41:07.4004518Z cpu = _conversion_method_template(device=torch.device("cpu")) 2025-09-07T06:41:08.2100908Z + python -c 'import sys;sys.path.insert(0, '\''fake_numpy'\'');import torch;print(torch.tensor([torch.tensor(0.), torch.tensor(1.)]))' 2025-09-07T06:41:09.0140533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: Sorry PyTorch, but our NumPy is in the other folder (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:84.) 2025-09-07T06:41:09.0143136Z cpu = _conversion_method_template(device=torch.device("cpu")) 2025-09-07T06:41:09.4468106Z tensor([0., 1.]) 2025-09-07T06:41:09.7229050Z + [[ slow == *dynamo_wrapped* ]] 2025-09-07T06:41:09.7229829Z + python -c 'import sys;sys.path.insert(0, '\''fake_numpy'\'');import torch; import torch.onnx' 2025-09-07T06:41:10.5276992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: Sorry PyTorch, but our NumPy is in the other folder (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:84.) 2025-09-07T06:41:10.5279048Z cpu = _conversion_method_template(device=torch.device("cpu")) 2025-09-07T06:41:11.2577638Z + popd 2025-09-07T06:41:11.2578009Z ~/pytorch 2025-09-07T06:41:11.2578368Z + install_torchvision 2025-09-07T06:41:11.2578762Z + local orig_preload 2025-09-07T06:41:11.2579104Z + local commit 2025-09-07T06:41:11.2585761Z ++ get_pinned_commit vision 2025-09-07T06:41:11.2586238Z ++ cat .github/ci_commit_pins/vision.txt 2025-09-07T06:41:11.2605009Z + commit=966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:41:11.2605312Z + orig_preload= 2025-09-07T06:41:11.2605509Z + '[' -n '' ']' 2025-09-07T06:41:11.2605723Z + [[ linux-jammy-rocm-py3.10 == *cuda* ]] 2025-09-07T06:41:11.2606309Z + pip_build_and_install git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 dist/vision 2025-09-07T06:41:11.2607132Z + local build_target=git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:41:11.2607919Z + local wheel_dir=dist/vision 2025-09-07T06:41:11.2608393Z + local found_whl=0 2025-09-07T06:41:11.2608822Z + for file in "${wheel_dir}"/*.whl 2025-09-07T06:41:11.2609345Z + [[ -f dist/vision/*.whl ]] 2025-09-07T06:41:11.2609755Z + '[' 0 == 0 ']' 2025-09-07T06:41:11.2610831Z + python3 -m pip wheel --no-build-isolation --no-deps --no-use-pep517 -w dist/vision git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:41:11.5914650Z Collecting git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:41:11.5925187Z Cloning https://github.com/pytorch/vision.git (to revision 966da7e46f65d6d49df3e31214470a4fe5cc8e66) to /tmp/pip-req-build-culbs4cq 2025-09-07T06:41:11.5962495Z Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-culbs4cq 2025-09-07T06:41:13.6450048Z Running command git rev-parse -q --verify 'sha^966da7e46f65d6d49df3e31214470a4fe5cc8e66' 2025-09-07T06:41:13.6482807Z Running command git fetch -q https://github.com/pytorch/vision.git 966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:41:13.9707241Z Running command git checkout -q 966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:41:14.4566247Z Resolved https://github.com/pytorch/vision.git to commit 966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:41:17.3488643Z Preparing metadata (setup.py) ... [?25l- \ | / - done 2025-09-07T06:41:17.3533389Z [?25hBuilding wheels for collected packages: torchvision 2025-09-07T06:41:17.3625330Z  DEPRECATION: Building 'torchvision' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'torchvision'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T06:42:16.6907105Z  Building wheel for torchvision (setup.py) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-09-07T06:42:16.6936579Z [?25h Created wheel for torchvision: filename=torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl size=1576362 sha256=8a7ee13f3fb404706dc15676326ba77321b3dbe0e0772cc727d4edb1178ca585 2025-09-07T06:42:16.6940475Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/9c/9d/3e/42fa2d5ac6ba44a90363f8fff0fa9e712e24d4f977637c81cb 2025-09-07T06:42:16.6975938Z Successfully built torchvision 2025-09-07T06:42:16.8175341Z + for file in "${wheel_dir}"/*.whl 2025-09-07T06:42:16.8176147Z + pip_install_whl dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T06:42:16.8177190Z + args=('dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl') 2025-09-07T06:42:16.8177904Z + local args 2025-09-07T06:42:16.8178420Z + [[ dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl == *\ * ]] 2025-09-07T06:42:16.8178938Z + for path in "${args[@]}" 2025-09-07T06:42:16.8179443Z + echo 'Installing dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl' 2025-09-07T06:42:16.8180159Z Installing dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T06:42:16.8180972Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T06:42:17.1612352Z Processing ./dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T06:42:17.1708205Z Installing collected packages: torchvision 2025-09-07T06:42:17.6062737Z Successfully installed torchvision-0.22.0a0+966da7e 2025-09-07T06:42:17.6504280Z + '[' -n '' ']' 2025-09-07T06:42:17.6504681Z + test_python_shard 1 2025-09-07T06:42:17.6505044Z + [[ -z 2 ]] 2025-09-07T06:42:17.6505971Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --shard 1 2 --verbose --upload-artifacts-while-running 2025-09-07T06:42:20.2568892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:42:20.2571184Z import pkg_resources 2025-09-07T06:42:21.0097475Z Excluding test_cuda_nvml_based_avail on ROCm 2025-09-07T06:42:22.5581220Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/pytorch/test/.pytorch-disabled-tests.json 2025-09-07T06:42:22.5715340Z Found test times from artifacts 2025-09-07T06:42:22.6281106Z Found test times from artifacts 2025-09-07T06:42:22.6294393Z Running all tests 2025-09-07T06:42:22.6721560Z Running parallel tests on 8 processes 2025-09-07T06:42:22.6738619Z Name: tests to run (est. time: 190.75min) 2025-09-07T06:42:22.6739089Z Serial tests (47): 2025-09-07T06:42:22.6739473Z test_ci_sanity_check_fail 1/1 2025-09-07T06:42:22.6739922Z test_utils 1/1 2025-09-07T06:42:22.6740263Z test_reductions 1/1 2025-09-07T06:42:22.6740645Z test_extension_utils 1/1 2025-09-07T06:42:22.6741064Z inductor/test_flex_attention 1/1 2025-09-07T06:42:22.6741514Z inductor/test_cutlass_backend 1/1 2025-09-07T06:42:22.6741982Z test_cpp_api_parity 1/1 2025-09-07T06:42:22.6742358Z test_fx 1/1 2025-09-07T06:42:22.6742718Z test_transformers_privateuse1 1/1 2025-09-07T06:42:22.6743146Z test_openreg 1/1 2025-09-07T06:42:22.6743517Z inductor/test_benchmark_fusion 1/1 2025-09-07T06:42:22.6743940Z test_show_pickle 1/1 2025-09-07T06:42:22.6744318Z test_tensorexpr 1/1 2025-09-07T06:42:22.6744700Z inductor/test_max_autotune 1/1 2025-09-07T06:42:22.6745142Z test_multiprocessing 1/1 2025-09-07T06:42:22.6745529Z test_dispatch 1/1 2025-09-07T06:42:22.6745898Z test_namedtuple_return_api 1/1 2025-09-07T06:42:22.6746329Z test_cpp_extensions_mtia_backend 1/1 2025-09-07T06:42:22.6746772Z test_jit_disabled 1/1 2025-09-07T06:42:22.6747133Z test_fake_tensor 1/1 2025-09-07T06:42:22.6747483Z test_cuda_trace 1/1 2025-09-07T06:42:22.6747866Z test_cpp_extensions_stream_and_event 1/1 2025-09-07T06:42:22.6748335Z test_python_dispatch 1/1 2025-09-07T06:42:22.6748736Z test_tensor_creation_ops 1/1 2025-09-07T06:42:22.6749388Z test_autograd_fallback 1/1 2025-09-07T06:42:22.6749643Z dynamo/test_fake_distributed 1/1 2025-09-07T06:42:22.6749943Z inductor/test_distributed_patterns 1/1 2025-09-07T06:42:22.6750269Z test_autocast 1/1 2025-09-07T06:42:22.6750482Z test_torch 1/1 2025-09-07T06:42:22.6750718Z functorch/test_memory_efficient_fusion 1/1 2025-09-07T06:42:22.6751007Z test_sort_and_select 1/1 2025-09-07T06:42:22.6751248Z test_cpp_extensions_jit 1/1 2025-09-07T06:42:22.6751482Z test_native_mha 1/1 2025-09-07T06:42:22.6751704Z test_cuda_primary_ctx 1/1 2025-09-07T06:42:22.6751937Z test_nn 1/1 2025-09-07T06:42:22.6752137Z nn/test_pooling 1/1 2025-09-07T06:42:22.6752368Z test_multiprocessing_spawn 1/1 2025-09-07T06:42:22.6752629Z nn/test_convolution 1/1 2025-09-07T06:42:22.6752859Z test_overrides 1/1 2025-09-07T06:42:22.6753075Z test_mobile_optimizer 1/1 2025-09-07T06:42:22.6753317Z test_spectral_ops 1/1 2025-09-07T06:42:22.6753570Z distributions/test_distributions 1/1 2025-09-07T06:42:22.6753832Z doctests 1/1 2025-09-07T06:42:22.6754028Z test_autoload_disable 1/1 2025-09-07T06:42:22.6754263Z test_autoload_enable 1/1 2025-09-07T06:42:22.6754508Z test_cpp_extensions_aot_ninja 1/1 2025-09-07T06:42:22.6754782Z test_cpp_extensions_aot_no_ninja 1/1 2025-09-07T06:42:22.6755055Z Parallel tests (236): 2025-09-07T06:42:22.6755285Z inductor/test_aot_inductor 1/1 2025-09-07T06:42:22.6755556Z inductor/test_triton_extension_backend 1/1 2025-09-07T06:42:22.6755847Z inductor/test_compiled_autograd 2/2 2025-09-07T06:42:22.6756124Z test_comparison_utils 1/1 2025-09-07T06:42:22.6756380Z inductor/test_provenance_tracing 1/1 2025-09-07T06:42:22.6756674Z export/test_functionalized_assertions 1/1 2025-09-07T06:42:22.6756955Z test_license 1/1 2025-09-07T06:42:22.6757179Z dynamo/test_base_output 1/1 2025-09-07T06:42:22.6757439Z inductor/test_triton_kernels 1/1 2025-09-07T06:42:22.6757702Z test_mkldnn_verbose 1/1 2025-09-07T06:42:22.6757950Z inductor/test_inductor_utils 1/1 2025-09-07T06:42:22.6758218Z inductor/test_flex_decoding 1/1 2025-09-07T06:42:22.6758624Z cpp_extensions/torch_stable_test_extension/torch_stable_test/test_torch_stable 1/1 2025-09-07T06:42:22.6759043Z inductor/test_analysis 1/1 2025-09-07T06:42:22.6759320Z test_rename_privateuse1_to_existing_device 1/1 2025-09-07T06:42:22.6759793Z inductor/test_cutedsl_template 1/1 2025-09-07T06:42:22.6760064Z inductor/test_ck_backend 1/1 2025-09-07T06:42:22.6760323Z inductor/test_memory_planning 1/1 2025-09-07T06:42:22.6760628Z export/test_export_with_inline_and_install 1/1 2025-09-07T06:42:22.6760943Z dynamo/test_skip_guard_eval_unsafe 1/1 2025-09-07T06:42:22.6761226Z inductor/test_inplace_padding 1/1 2025-09-07T06:42:22.6761494Z dynamo/test_buffers_override 1/1 2025-09-07T06:42:22.6761749Z test_custom_ops 1/1 2025-09-07T06:42:22.6761975Z inductor/test_b2b_gemm 1/1 2025-09-07T06:42:22.6762224Z functorch/test_ac_logging 1/1 2025-09-07T06:42:22.6762499Z inductor/test_inductor_annotations 1/1 2025-09-07T06:42:22.6762774Z dynamo/test_resume 1/1 2025-09-07T06:42:22.6763030Z inductor/test_template_heuristics_registry 1/1 2025-09-07T06:42:22.6763328Z inductor/test_debug_trace 1/1 2025-09-07T06:42:22.6763575Z test_ao_sparsity 1/1 2025-09-07T06:42:22.6763812Z inductor/test_async_compile 1/1 2025-09-07T06:42:22.6764067Z dynamo/test_nops 1/1 2025-09-07T06:42:22.6764297Z torch_np/test_nep50_examples 1/1 2025-09-07T06:42:22.6764552Z torch_np/test_binary_ufuncs 1/1 2025-09-07T06:42:22.6764801Z inductor/test_best_config 1/1 2025-09-07T06:42:22.6765037Z test_hop_infra 1/1 2025-09-07T06:42:22.6765257Z torch_np/test_unary_ufuncs 1/1 2025-09-07T06:42:22.6765520Z inductor/test_aot_inductor_package 1/1 2025-09-07T06:42:22.6765780Z inductor/test_pad_mm 1/1 2025-09-07T06:42:22.6766022Z typing/test_python_operators 1/1 2025-09-07T06:42:22.6766433Z inductor/test_aot_inductor_custom_ops 1/1 2025-09-07T06:42:22.6766720Z inductor/test_cudagraph_trees 1/1 2025-09-07T06:42:22.6766977Z inductor/test_compile_worker 1/1 2025-09-07T06:42:22.6767223Z dynamo/test_modules 1/1 2025-09-07T06:42:22.6767454Z test_transformers 1/1 2025-09-07T06:42:22.6767678Z dynamo/test_global 1/1 2025-09-07T06:42:22.6767902Z export/test_export 1/1 2025-09-07T06:42:22.6768119Z test_foreach 1/1 2025-09-07T06:42:22.6768343Z test_appending_byte_serializer 1/1 2025-09-07T06:42:22.6768604Z test_fx_experimental 1/1 2025-09-07T06:42:22.6768852Z inductor/test_triton_wrapper 1/1 2025-09-07T06:42:22.6769143Z inductor/test_torchinductor_strided_blocks 1/1 2025-09-07T06:42:22.6769428Z test_file_check 1/1 2025-09-07T06:42:22.6769645Z dynamo/test_interop 1/1 2025-09-07T06:42:22.6769882Z dynamo/test_metrics_context 1/1 2025-09-07T06:42:22.6770139Z test_functionalization 1/1 2025-09-07T06:42:22.6770388Z dynamo/test_inline_and_install 1/1 2025-09-07T06:42:22.6770645Z inductor/test_smoke 1/1 2025-09-07T06:42:22.6770880Z torch_np/test_ufuncs_basic 1/1 2025-09-07T06:42:22.6771121Z test_proxy_tensor 1/1 2025-09-07T06:42:22.6771353Z inductor/test_fx_fusion 1/1 2025-09-07T06:42:22.6771619Z inductor/test_move_constructors_to_cuda 1/1 2025-09-07T06:42:22.6771904Z dynamo/test_skip_non_tensor 1/1 2025-09-07T06:42:22.6772155Z export/test_tree_utils 1/1 2025-09-07T06:42:22.6772391Z dynamo/test_frame_init 1/1 2025-09-07T06:42:22.6772632Z torch_np/test_dtype 1/1 2025-09-07T06:42:22.6772869Z inductor/test_indexing 1/1 2025-09-07T06:42:22.6773117Z inductor/test_minifier_utils 1/1 2025-09-07T06:42:22.6773367Z test_typing 1/1 2025-09-07T06:42:22.6773606Z functorch/test_aot_joint_with_descriptors 1/1 2025-09-07T06:42:22.6773996Z test_utils_filelock 1/1 2025-09-07T06:42:22.6774236Z inductor/test_torchinductor 1/1 2025-09-07T06:42:22.6774492Z inductor/test_metrics 1/1 2025-09-07T06:42:22.6774766Z inductor/test_coordinate_descent_tuner 1/1 2025-09-07T06:42:22.6775048Z inductor/test_foreach 1/1 2025-09-07T06:42:22.6775291Z backends/xeon/test_launch 1/1 2025-09-07T06:42:22.6775539Z dynamo/test_functions 1/1 2025-09-07T06:42:22.6775791Z inductor/test_torchinductor_opinfo 1/12 2025-09-07T06:42:22.6776083Z inductor/test_torchinductor_opinfo 4/12 2025-09-07T06:42:22.6776520Z inductor/test_torchinductor_opinfo 5/12 2025-09-07T06:42:22.6776805Z inductor/test_torchinductor_opinfo 8/12 2025-09-07T06:42:22.6777075Z inductor/test_torchinductor_opinfo 9/12 2025-09-07T06:42:22.6777361Z inductor/test_torchinductor_opinfo 12/12 2025-09-07T06:42:22.6777630Z dynamo/test_dicts 1/1 2025-09-07T06:42:22.6777850Z dynamo/test_sdpa 1/1 2025-09-07T06:42:22.6778068Z dynamo/test_list 1/1 2025-09-07T06:42:22.6778295Z inductor/test_autoheuristic 1/1 2025-09-07T06:42:22.6778551Z test_flop_counter 1/1 2025-09-07T06:42:22.6778785Z dynamo/test_fx_graph_runnable 1/1 2025-09-07T06:42:22.6779050Z inductor/test_ordered_set 1/1 2025-09-07T06:42:22.6779294Z dynamo/test_recompiles 1/1 2025-09-07T06:42:22.6779527Z test_per_overload_api 1/1 2025-09-07T06:42:22.6779766Z inductor/test_xpu_basic 1/1 2025-09-07T06:42:22.6780007Z export/test_cpp_serdes 1/1 2025-09-07T06:42:22.6780240Z inductor/test_utils 1/1 2025-09-07T06:42:22.6780479Z inductor/test_cuda_repro 1/1 2025-09-07T06:42:22.6780716Z test_pytree 1/1 2025-09-07T06:42:22.6780920Z inductor/test_fp8 1/1 2025-09-07T06:42:22.6781154Z dynamo/test_nested_graph_breaks 1/1 2025-09-07T06:42:22.6781422Z dynamo/test_pre_dispatch 1/1 2025-09-07T06:42:22.6781674Z dynamo/test_fx_passes_pre_grad 1/1 2025-09-07T06:42:22.6781935Z inductor/test_combo_kernels 1/1 2025-09-07T06:42:22.6782188Z inductor/test_gpu_cpp_wrapper 1/1 2025-09-07T06:42:22.6782452Z inductor/test_device_assert 1/1 2025-09-07T06:42:22.6782706Z inductor/test_op_completeness 1/1 2025-09-07T06:42:22.6782960Z export/test_tools 1/1 2025-09-07T06:42:22.6783345Z dynamo/test_subgraphs 1/1 2025-09-07T06:42:22.6783591Z dynamo/test_dynamic_shapes 1/1 2025-09-07T06:42:22.6783853Z inductor/test_aot_inductor_utils 1/1 2025-09-07T06:42:22.6784127Z functorch/test_ops 1/3 2025-09-07T06:42:22.6784361Z functorch/test_ops 2/3 2025-09-07T06:42:22.6784603Z inductor/test_cpu_select_algorithm 1/1 2025-09-07T06:42:22.6784866Z xpu/test_gemm 1/1 2025-09-07T06:42:22.6785104Z higher_order_ops/test_invoke_quant 1/1 2025-09-07T06:42:22.6785382Z inductor/test_online_softmax 1/1 2025-09-07T06:42:22.6785649Z inductor/test_split_cat_fx_passes 1/1 2025-09-07T06:42:22.6785922Z test_cuda_expandable_segments 1/1 2025-09-07T06:42:22.6786179Z test_type_hints 1/1 2025-09-07T06:42:22.6786401Z dynamo/test_unittest 1/1 2025-09-07T06:42:22.6786652Z dynamo/test_guard_serialization 1/1 2025-09-07T06:42:22.6786918Z functorch/test_minifier 1/1 2025-09-07T06:42:22.6787158Z test_legacy_vmap 1/1 2025-09-07T06:42:22.6787417Z dynamo/test_cudagraphs_expandable_segments 1/1 2025-09-07T06:42:22.6787722Z torch_np/numpy_tests/core/test_einsum 1/1 2025-09-07T06:42:22.6788007Z inductor/test_benchmarking 1/1 2025-09-07T06:42:22.6788260Z dynamo/test_model_output 1/1 2025-09-07T06:42:22.6788505Z torch_np/test_basic 1/1 2025-09-07T06:42:22.6788732Z test_segment_reductions 1/1 2025-09-07T06:42:22.6788977Z test_ops_fwd_gradients 1/1 2025-09-07T06:42:22.6789214Z inductor/test_compile 1/1 2025-09-07T06:42:22.6789446Z test_pruning_op 1/1 2025-09-07T06:42:22.6789665Z inductor/test_multi_kernel 1/1 2025-09-07T06:42:22.6789930Z inductor/test_decompose_mem_bound_mm 1/1 2025-09-07T06:42:22.6790207Z inductor/test_block_analysis 1/1 2025-09-07T06:42:22.6790460Z inductor/test_minifier_isolate 1/1 2025-09-07T06:42:22.6790716Z export/test_swap 1/1 2025-09-07T06:42:22.6790936Z functorch/test_dims 1/1 2025-09-07T06:42:22.6791168Z profiler/test_profiler 1/1 2025-09-07T06:42:22.6791414Z inductor/test_op_dtype_prop 1/1 2025-09-07T06:42:22.6791669Z test_tensorexpr_pybind 1/1 2025-09-07T06:42:22.6791927Z inductor/test_split_cat_fx_aten_passes 1/1 2025-09-07T06:42:22.6792200Z dynamo/test_misc 1/1 2025-09-07T06:42:22.6792422Z inductor/test_loop_ordering 1/1 2025-09-07T06:42:22.6792703Z inductor/test_torchinductor_dynamic_shapes 1/2 2025-09-07T06:42:22.6793119Z inductor/test_cutlass_evt 1/1 2025-09-07T06:42:22.6793371Z dynamo/test_sets 1/1 2025-09-07T06:42:22.6793593Z test_numpy_interop 1/1 2025-09-07T06:42:22.6793877Z inductor/test_cudagraph_trees_expandable_segments 1/1 2025-09-07T06:42:22.6794215Z dynamo/test_backward_higher_order_ops 1/1 2025-09-07T06:42:22.6794555Z inductor/test_torchinductor_codegen_config_overrides 1/1 2025-09-07T06:42:22.6794877Z test_nestedtensor 1/1 2025-09-07T06:42:22.6795107Z dynamo/test_export_mutations 1/1 2025-09-07T06:42:22.6795377Z inductor/test_scatter_optimization 1/1 2025-09-07T06:42:22.6795655Z test_ops_jit 1/1 2025-09-07T06:42:22.6795901Z torch_np/numpy_tests/core/test_multiarray 1/2 2025-09-07T06:42:22.6805421Z torch_np/numpy_tests/core/test_multiarray 2/2 2025-09-07T06:42:22.6805777Z functorch/test_ac 1/1 2025-09-07T06:42:22.6806055Z dynamo/test_higher_order_ops 1/1 2025-09-07T06:42:22.6806356Z dynamo/test_comptime 1/1 2025-09-07T06:42:22.6806610Z test_datapipe 1/1 2025-09-07T06:42:22.6806856Z dynamo/test_logging 1/1 2025-09-07T06:42:22.6807110Z dynamo/test_debug_utils 1/1 2025-09-07T06:42:22.6807366Z test_out_dtype_op 1/1 2025-09-07T06:42:22.6807624Z functorch/test_eager_transforms 1/1 2025-09-07T06:42:22.6807904Z export/test_hop 1/1 2025-09-07T06:42:22.6808148Z profiler/test_cpp_thread 1/1 2025-09-07T06:42:22.6808415Z dynamo/test_aot_autograd_cache 1/1 2025-09-07T06:42:22.6808714Z inductor/test_auto_functionalize 1/1 2025-09-07T06:42:22.6809003Z torch_np/test_function_base 1/1 2025-09-07T06:42:22.6809294Z dynamo/test_activation_checkpointing 1/1 2025-09-07T06:42:22.6809945Z cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic 1/1 2025-09-07T06:42:22.6810355Z dynamo/test_aot_autograd 1/1 2025-09-07T06:42:22.6810623Z dynamo/test_graph_deduplication 1/1 2025-09-07T06:42:22.6810919Z test_model_exports_to_core_aten 1/1 2025-09-07T06:42:22.6811179Z test_itt 1/1 2025-09-07T06:42:22.6811395Z test_modules 1/3 2025-09-07T06:42:22.6811617Z test_modules 3/3 2025-09-07T06:42:22.6811840Z inductor/test_mps_basic 1/1 2025-09-07T06:42:22.6812094Z test_decomp 2/22 2025-09-07T06:42:22.6812306Z test_decomp 3/22 2025-09-07T06:42:22.6812513Z test_decomp 6/22 2025-09-07T06:42:22.6812718Z test_decomp 7/22 2025-09-07T06:42:22.6812930Z test_decomp 10/22 2025-09-07T06:42:22.6813150Z test_decomp 11/22 2025-09-07T06:42:22.6813360Z test_decomp 14/22 2025-09-07T06:42:22.6813566Z test_decomp 15/22 2025-09-07T06:42:22.6813774Z test_decomp 18/22 2025-09-07T06:42:22.6814050Z test_decomp 19/22 2025-09-07T06:42:22.6814267Z test_decomp 22/22 2025-09-07T06:42:22.6814488Z dynamo/test_einops 1/1 2025-09-07T06:42:22.6814732Z dynamo/test_callback 1/1 2025-09-07T06:42:22.6814985Z nn/test_parametrization 1/1 2025-09-07T06:42:22.6815226Z test_masked 1/1 2025-09-07T06:42:22.6815451Z export/test_experimental 1/1 2025-09-07T06:42:22.6815706Z nn/test_pruning 1/1 2025-09-07T06:42:22.6815944Z export/test_converter 1/1 2025-09-07T06:42:22.6816199Z test_bundled_inputs 1/1 2025-09-07T06:42:22.6816444Z inductor/test_fxir_backend 1/1 2025-09-07T06:42:22.6816742Z torch_np/numpy_tests/lib/test_histograms 1/1 2025-09-07T06:42:22.6817048Z test_maskedtensor 1/1 2025-09-07T06:42:22.6817287Z test_autograd 1/1 2025-09-07T06:42:22.6817521Z dynamo/test_reorder_logs 1/1 2025-09-07T06:42:22.6817782Z dynamo/test_exceptions 1/1 2025-09-07T06:42:22.6818034Z export/test_lift_unlift 1/1 2025-09-07T06:42:22.6818286Z test_public_bindings 1/1 2025-09-07T06:42:22.6818538Z dynamo/test_exc 1/1 2025-09-07T06:42:22.6818773Z test_sparse_semi_structured 1/1 2025-09-07T06:42:22.6819037Z dynamo/test_input_attr_tracking 1/1 2025-09-07T06:42:22.6819308Z functorch/test_control_flow 1/1 2025-09-07T06:42:22.6819568Z test_matmul_cuda 1/1 2025-09-07T06:42:22.6819797Z test_dataloader 1/2 2025-09-07T06:42:22.6820012Z test_dataloader 2/2 2025-09-07T06:42:22.6820411Z optim/test_swa_utils 1/1 2025-09-07T06:42:22.6820663Z test_xnnpack_integration 2/4 2025-09-07T06:42:22.6820920Z test_xnnpack_integration 4/4 2025-09-07T06:42:22.6821158Z test_mkldnn 1/1 2025-09-07T06:42:22.6821360Z test_linalg 2/3 2025-09-07T06:42:22.6821566Z test_mkldnn_fusion 1/1 2025-09-07T06:42:22.6821786Z test_sparse_csr 1/1 2025-09-07T06:42:22.6822004Z test_type_promotion 1/1 2025-09-07T06:42:22.6822244Z torch_np/test_reductions 1/1 2025-09-07T06:42:22.6822479Z test_dlpack 1/1 2025-09-07T06:42:22.6822713Z torch_np/numpy_tests/core/test_scalar_ctors 1/1 2025-09-07T06:42:22.6823028Z profiler/test_profiler_tree 1/1 2025-09-07T06:42:22.6823280Z test_prims 1/1 2025-09-07T06:42:22.6823488Z test_jit_autocast 1/1 2025-09-07T06:42:22.6823723Z profiler/test_torch_tidy 1/1 2025-09-07T06:42:22.6823971Z profiler/test_python_tracer 1/1 2025-09-07T06:42:22.6824222Z lazy/test_reuse_ir 1/1 2025-09-07T06:42:22.6824445Z test_quantization 1/13 2025-09-07T06:42:22.6824672Z test_quantization 2/13 2025-09-07T06:42:22.6824895Z test_quantization 5/13 2025-09-07T06:42:22.6825119Z test_quantization 6/13 2025-09-07T06:42:22.6825341Z test_quantization 9/13 2025-09-07T06:42:22.6825567Z test_quantization 10/13 2025-09-07T06:42:22.6825798Z test_quantization 13/13 2025-09-07T06:42:22.6826038Z Name: excluded (est. time: 0.0min) 2025-09-07T06:42:22.6826293Z Serial tests (0): 2025-09-07T06:42:22.6826505Z Parallel tests (0): 2025-09-07T06:42:22.6826821Z Running test_ci_sanity_check_fail 1/1 ... [2025-09-07 06:42:22.682375] 2025-09-07T06:42:22.6827356Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:42:22.6828677Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ci_sanity_check_fail.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:42:22.682693] 2025-09-07T06:42:36.4610730Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T06:42:36.4611675Z Uploading artifacts took 0.00 seconds 2025-09-07T06:42:36.4612400Z Running test_utils 1/1 ... [2025-09-07 06:42:36.461023] 2025-09-07T06:42:36.4613049Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:42:36.4615903Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:42:36.461333] 2025-09-07T06:42:53.5016187Z 2025-09-07T06:42:53.5016973Z test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_utils_1.1_ee94383e2ae93732_.log 2025-09-07T06:42:53.6738791Z Running 6012 items in this shard: test/test_utils.py::TestCheckpoint::test_checkpoint, test/test_utils.py::TestCheckpoint::test_checkpoint_module_list, test/test_utils.py::TestCheckpoint::test_checkpoint_no_tensors, test/test_utils.py::TestCheckpoint::test_checkpoint_non_tensor, test/test_utils.py::TestCheckpoint::test_checkpoint_non_tensor_inputs_outputs, test/test_utils.py::TestCheckpoint::test_checkpoint_not_preserve_rng_state_and_without_reentrant, test/test_utils.py::TestCheckpoint::test_checkpoint_partial_grad, test/test_utils.py::TestCheckpoint::test_checkpoint_rng_cpu, test/test_utils.py::TestCheckpoint::test_checkpoint_rng_cuda, test/test_utils.py::TestCheckpoint::test_checkpoint_sequential_deprecated_multiple_args, test/test_utils.py::TestCheckpoint::test_checkpoint_sequential_deprecated_no_args, test/test_utils.py::TestCheckpoint::test_checkpoint_trigger, test/test_utils.py::TestCheckpoint::test_checkpoint_valid, test/test_utils.py::TestCheckpoint::test_checkpointing_without_reentrant_early_free, test/test_utils.py::TestCheckpoint::test_get_device_states_recursive, test/test_utils.py::TestCheckpoint::test_infer_device_state_recursive_meta, test/test_utils.py::TestCheckpoint::test_infer_device_state_recursive_multi_cuda, test/test_utils.py::TestDataLoaderUtils::test_multi_drop, test/test_utils.py::TestDataLoaderUtils::test_multi_keep, test/test_utils.py::TestDataLoaderUtils::test_random_seed, test/test_utils.py::TestDataLoaderUtils::test_single_drop, test/test_utils.py::TestDataLoaderUtils::test_single_keep, test/test_utils.py::TestBottleneck::test_bottleneck_cpu_only, test/test_utils.py::TestBottleneck::test_bottleneck_cuda, test/test_utils.py::TestCollectEnv::test_smoke, test/test_utils.py::TestHipify::test_import_hipify, test/test_utils.py::TestHipifyTrie::test_add_and_search_trie, test/test_utils.py::TestHipifyTrie::test_add_multiple_and_search_trie, test/test_utils.py::TestHipifyTrie::test_char_export_trie_to_regex, test/test_utils.py::TestHipifyTrie::test_export_trie_to_regex, test/test_utils.py::TestHipifyTrie::test_prefix_words_export_trie_to_regex, test/test_utils.py::TestHipifyTrie::test_quote_escape, test/test_utils.py::TestHipifyTrie::test_single_export_trie_to_regex, test/test_utils.py::TestHipifyTrie::test_special_char_export_trie_to_regex, test/test_utils.py::TestAssert::test_assert_scriptable, test/test_utils.py::TestAssert::test_assert_true, test/test_utils.py::TestStandaloneCPPJIT::test_load_standalone, test/test_utils.py::TestRenderUtils::test_basic, test/test_utils.py::TestDeviceUtilsCUDA::test_basic_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_decorator_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_decorator_generator_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__batch_norm_with_update_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__batch_norm_with_update_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__batch_norm_with_update_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__batch_norm_with_update_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__native_batch_norm_legit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__native_batch_norm_legit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__native_batch_norm_legit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__native_batch_norm_legit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_lengths_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_lengths_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_lengths_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_lengths_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_offsets_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_offsets_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_offsets_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_offsets_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__softmax_backward_data_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__softmax_backward_data_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__softmax_backward_data_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__softmax_backward_data_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__upsample_bilinear2d_aa_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__upsample_bilinear2d_aa_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__upsample_bilinear2d_aa_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__upsample_bilinear2d_aa_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bernoulli_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bernoulli_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bernoulli_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bernoulli_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bincount_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bincount_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bincount_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bincount_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bincount_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_left_shift_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_left_shift_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_left_shift_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_left_shift_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_left_shift_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_right_shift_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_right_shift_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_right_shift_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_right_shift_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_right_shift_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_shapes_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cauchy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cauchy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cauchy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cauchy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdist_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdist_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_inverse_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_inverse_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_inverse_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_inverse_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_complex_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_complex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_complex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exponential_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exponential_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exponential_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exponential_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float8_e4m3fn, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float8_e4m3fnuz, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float8_e5m2, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float8_e5m2fnuz, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frac_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frac_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frac_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frac_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_uint16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_uint32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gcd_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gcd_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gcd_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gcd_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gcd_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geqrf_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geqrf_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geqrf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geqrf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hypot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hypot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hypot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hypot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_igamma_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_igamma_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_igammac_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_igammac_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_imag_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_imag_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_imag_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_istft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_istft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lcm_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lcm_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lcm_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lcm_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lcm_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_ex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_ex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_ex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_ex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cond_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cond_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cond_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cond_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_det_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_det_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_det_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_det_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eig_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eig_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eig_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eig_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvals_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvals_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvals_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvals_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvalsh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvalsh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvalsh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvalsh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_householder_product_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_householder_product_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_householder_product_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_householder_product_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_ex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_ex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_ex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_ex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_ex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_ex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_ex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_ex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_grad_oriented_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_grad_oriented_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_grad_oriented_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_grad_oriented_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_ex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_ex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_ex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_ex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_power_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_power_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_power_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_power_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_hermitian_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_hermitian_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_hermitian_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_hermitian_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_hermitian_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_hermitian_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_hermitian_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_hermitian_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_singular_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_singular_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_singular_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_singular_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_qr_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_qr_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_qr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_qr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_slogdet_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_slogdet_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_slogdet_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_slogdet_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_ex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_ex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_ex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_ex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_triangular_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_triangular_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_triangular_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_triangular_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svd_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svd_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svd_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svd_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svdvals_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svdvals_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svdvals_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svdvals_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorinv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorinv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorinv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorinv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorsolve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorsolve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorsolve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorsolve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_normal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_normal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_normal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_normal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp2_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logdet_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logdet_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logdet_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logdet_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_unpack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_unpack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_unpack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_unpack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_log_softmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_log_softmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_log_softmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_log_softmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logaddexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logaddexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logaddexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logaddexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_median_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_median_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_median_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_median_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_pool2d_with_indices_backward_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_pool2d_with_indices_backward_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_pool2d_with_indices_backward_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_multinomial_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_multinomial_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_multinomial_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_multinomial_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanquantile_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanquantile_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_batch_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_batch_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_batch_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_batch_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_dropout_backward_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_dropout_backward_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_dropout_backward_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_dropout_backward_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_layer_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_layer_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_layer_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_layer_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nextafter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nextafter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nextafter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nextafter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_alpha_dropout_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_alpha_dropout_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_alpha_dropout_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_alpha_dropout_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_without_cudnn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_bilinear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_bilinear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_bilinear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_bilinear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_celu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_celu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_celu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_celu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_similarity_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_similarity_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_similarity_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_similarity_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cross_entropy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cross_entropy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cross_entropy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cross_entropy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_ctc_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_ctc_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_elu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_elu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_elu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_elu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_bag_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_bag_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_bag_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_bag_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gaussian_nll_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gaussian_nll_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gaussian_nll_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gaussian_nll_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gelu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gelu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gelu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gelu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_glu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_glu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_glu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_glu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_grid_sample_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_grid_sample_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_grid_sample_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_grid_sample_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_group_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_group_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_group_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_group_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardshrink_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardshrink_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardshrink_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardshrink_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardsigmoid_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardsigmoid_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardsigmoid_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardsigmoid_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardswish_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardswish_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardswish_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardswish_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hinge_embedding_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hinge_embedding_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hinge_embedding_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_huber_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_huber_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_huber_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_huber_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_instance_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_instance_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_instance_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_instance_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_area_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_area_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_area_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_area_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bicubic_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bicubic_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bicubic_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bilinear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bilinear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bilinear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bilinear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_linear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_linear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_linear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_linear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest-exact_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_trilinear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_trilinear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_trilinear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_kl_div_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_kl_div_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_kl_div_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_kl_div_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_layer_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_layer_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_layer_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_layer_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_leaky_relu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_leaky_relu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_leaky_relu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_leaky_relu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_local_response_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_local_response_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_local_response_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_local_response_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_logsigmoid_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_logsigmoid_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_logsigmoid_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_logsigmoid_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_grad_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_grad_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_grad_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_grad_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_grad_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_grad_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_grad_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_grad_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_grad_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mish_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mish_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mish_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mish_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mse_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mse_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mse_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mse_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_head_attention_forward_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_head_attention_forward_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_head_attention_forward_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_margin_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_margin_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_margin_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_margin_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_margin_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_margin_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_nll_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_nll_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_nll_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_nll_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_one_hot_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pdist_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pdist_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_prelu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_prelu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_prelu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_prelu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rrelu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rrelu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rrelu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rrelu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_selu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_selu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_selu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_selu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_complex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_complex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_smooth_l1_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_smooth_l1_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_smooth_l1_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_soft_margin_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_soft_margin_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_soft_margin_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softplus_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softplus_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softplus_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softplus_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softshrink_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softshrink_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softshrink_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softshrink_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_bilinear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_bilinear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_bilinear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_nearest_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_nearest_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_nearest_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_nearest_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_nearest_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_nuc_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_nuc_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_nuc_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_nuc_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_number_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_number_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_number_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_number_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ormqr_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ormqr_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ormqr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ormqr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pca_lowrank_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pca_lowrank_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pca_lowrank_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pca_lowrank_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pinverse_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pinverse_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pinverse_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pinverse_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polar_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polar_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_qr_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_qr_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_qr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_qr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_quantile_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_quantile_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_0_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_0_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_3_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_3_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_3_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_3_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_neg_3_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_neg_3_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_neg_3_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_neg_3_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_bartlett_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_bartlett_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_blackman_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_blackman_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_cosine_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_cosine_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_exponential_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_exponential_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_gaussian_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_gaussian_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_general_cosine_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_general_cosine_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_general_hamming_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_general_hamming_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_hamming_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_hamming_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_hann_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_hann_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_kaiser_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_kaiser_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_nuttall_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_nuttall_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_mm_reduce_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_mm_reduce_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_mm_reduce_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_mm_reduce_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_sampled_addmm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_sampled_addmm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_sampled_addmm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_sampled_addmm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_lowrank_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_lowrank_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_lowrank_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_lowrank_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch__scaled_mm_cuda_float8_e4m3fn, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triangular_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triangular_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triangular_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triangular_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_indices_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_indices_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_indices_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_indices_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_uint16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_uint32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_uint64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unravel_index_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unravel_index_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unravel_index_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unravel_index_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unravel_index_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_complex_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_complex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_complex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_real_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_real_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_get_default_device_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_get_default_device_more_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_nn_module_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_set_default_device_cuda, test/test_utils.py::TestCppExtensionUtils::test_cc_compiler_is_ok, test/test_utils.py::TestCppExtensionUtils::test_cpp_compiler_is_ok, test/test_utils.py::TestTraceback::test_basic, test/test_utils.py::TestTraceback::test_captured_traceback, test/test_utils.py::TestTraceback::test_captured_traceback_format_all, test/test_utils.py::TestTraceback::test_captured_traceback_format_all_cached, test/test_utils.py::TestTraceback::test_format_traceback_short, test/test_utils.py::TestTryImport::test_import_existing, test/test_utils.py::TestTryImport::test_import_imported, test/test_utils.py::TestTryImport::test_import_missing, test/test_utils.py::TestDeprecate::test_deprecated 2025-09-07T06:42:53.8386155Z 2025-09-07T06:42:53.8386336Z Running test_reductions 1/1 ... [2025-09-07 06:42:53.510038] 2025-09-07T06:42:53.8386672Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:42:53.8387529Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_reductions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:42:53.510365] 2025-09-07T06:43:07.4447525Z 2025-09-07T06:43:07.4448793Z test_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_reductions_1.1_2a4accb39e86f630_.log 2025-09-07T06:43:07.5871699Z Running 4755 items in this shard: test/test_reductions.py::TestReductionsCUDA::test_accreal_type_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_all_any_with_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_issue117215_cuda, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_amin_amax_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_axis_with_dim_one_cuda, test/test_reductions.py::TestReductionsCUDA::test_argminmax_large_axis_cuda, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_bincount_cuda, test/test_reductions.py::TestReductionsCUDA::test_bucketization_cuda, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_cumprod_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_cumsum_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_lastdim_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_lastdim_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_less_than_64_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_histc_cuda, test/test_reductions.py::TestReductionsCUDA::test_histc_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_histc_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_histogram_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histogram_error_handling_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histogramdd_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_invalid_0dim_aminmax_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_invalid_0dim_aminmax_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logcumsumexp_complex_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_logcumsumexp_complex_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_integral_promotion_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_max_elementwise_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_mixed_devices_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mean_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_mean_int_with_optdtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_corner_cases_cuda, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_min_elementwise_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_max_nan_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_mixed_devices_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_minmax_illegal_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_boolean_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_mode_wrong_device_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_wrong_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_complex_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nansum_complex_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_numpy_named_args_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_bool_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_prod_gpu_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_prod_gpu_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_prod_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_prod_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_quantile_backward_cuda, test/test_reductions.py::TestReductionsCUDA::test_quantile_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_quantile_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_quantile_error_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduce_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_empty_any_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_split_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_repeated_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_std_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_all_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_std_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_sum_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_cpu_device_mismatch_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_dim_reduction_uint8_overflow_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_sum_out_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_parallel_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_tensor_compare_ops_argmax_argmix_kthvalue_dim_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_tensor_compare_ops_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_tensor_reduce_ops_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_large_input_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_all_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_var_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_stability2_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_stability_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_float64 2025-09-07T06:43:07.7247498Z 2025-09-07T06:43:07.7247672Z Running test_extension_utils 1/1 ... [2025-09-07 06:43:07.451294] 2025-09-07T06:43:07.7248048Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:43:07.7248925Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_extension_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:43:07.451632] 2025-09-07T06:43:10.9713188Z 2025-09-07T06:43:10.9714247Z test_extension_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_extension_utils_1.1_4f5a0484ffc35462_.log 2025-09-07T06:43:10.9716345Z Running 2 items in this shard: test/test_extension_utils.py::TestExtensionUtils::test_external_module_register, test/test_extension_utils.py::TestExtensionUtils::test_external_module_register_with_renamed_backend 2025-09-07T06:43:10.9717620Z 2025-09-07T06:43:10.9717968Z Running inductor/test_flex_attention 1/1 ... [2025-09-07 06:43:10.971539] 2025-09-07T06:43:10.9718634Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:43:10.9721444Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_attention.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:43:10.971854] 2025-09-07T06:43:21.0007087Z 2025-09-07T06:43:21.0008286Z inductor/test_flex_attention 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_attention_1.1_97c2cf04d3c58d7d_.log 2025-09-07T06:43:21.0323718Z Running 726 items in this shard: test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_causal_mask_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod2_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod3_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod4_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod5_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_autograd_function_in_score_mod_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_block_mask_non_divisible_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_seqlen_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_seqlen_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_seqlen_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_seqlen_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_seqlen_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_seqlen_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_seqlen_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_seqlen_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod4_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod4_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod5_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod5_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod6_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod6_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod7_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod7_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_cant_lower_error_message_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_buffers_all_dims_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_buffers_all_dims_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_buffers_all_dims_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_reduction_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_scale_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_aot_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_wrong_device_error_message_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_causal_block_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_causal_block_non_divisible_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_causal_block_non_divisible_with_captured_buffer_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_causal_block_paged_attention_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_cpu_error_message_return_lse_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_custom_block_mask_generator_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_debug_flag_disables_internal_compilation_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_dependent_causal_bidirectional_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_device_cuda_1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_differentiable_logsumexp_compiled_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_differentiable_logsumexp_gradcheck_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_doc_mask_sparse_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_document_masking_edge_case_mode_aot_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_document_masking_edge_case_mode_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_dynamic_divisibility_guards_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_dynamic_shapes_bug_dynamic_batch_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_dynamic_shapes_with_custom_kernel_options_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_dynamic_shapes_with_max_autotune_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_eager_backward_strides_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_epilogue_fused_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order0_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order0_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order1_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order1_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order2_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order2_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order3_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order3_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order4_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order4_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order0_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order0_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order1_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order1_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order2_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order2_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order3_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order3_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order4_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order4_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_eager_permute_order0_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_eager_permute_order0_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_eager_permute_order1_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_eager_permute_order1_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_eager_permute_order2_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_eager_permute_order2_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_eager_permute_order3_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_eager_permute_order3_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_eager_permute_order4_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_eager_permute_order4_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order0_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order0_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order1_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order1_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order2_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order2_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order3_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order3_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order4_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order4_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order0_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order0_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order1_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order1_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order2_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order2_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order3_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order3_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order4_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order4_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_float32_matmul_precision_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_force_write_lse_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_free_symbol_dynamic_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_fully_masked_out_rows_0_check_compile_False_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_fully_masked_out_rows_0_check_compile_True_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_fully_masked_out_rows_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_function_composition_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_function_composition_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_function_composition_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_fw_bw_graph_correctness_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_index_multiple_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_index_weird1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_index_weird2_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_inputs_are_realized_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_invalid_block_size_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kernel_options_argument_is_respected_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims0_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims0_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims0_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims0_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims0_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims0_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_large_batch_heads_grid_dimension_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_load_from_bias_head_seq_batch_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_load_from_bias_seq_batch_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_load_from_bias_seq_only_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_load_from_view_buffer_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_load_rel_bias_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_only_return_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_lse_masked_output_backend_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_lse_masked_output_backend_flex_attention_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_lse_masked_output_backend_flex_decode_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_make_block_mask_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_mask_mod_combiners_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_max_autotune_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_max_autotune_with_captured_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_mixed_device_error_message_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_mixed_dtypes_fails_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_modular_indexing_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_multiple_mask_calls_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_multiple_score_mod_calls2_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_multiple_score_mod_calls2_paged_attention_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_multiple_score_mod_calls_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_multiple_score_mod_calls_paged_attention_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_natten_2d_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_new_empty_mask_mod_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_njt_causal_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_njt_causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_njt_causal_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_contiguous_last_dim_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_divisible_with_captured_buffer_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod0_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod0_head_dims0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod0_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod0_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod0_head_dims1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod0_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod2_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod2_head_dims0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod2_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod2_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod2_head_dims1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod2_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod3_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod3_head_dims0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod3_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod3_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod3_head_dims1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod3_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod4_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod4_head_dims0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod4_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod4_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod4_head_dims1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod4_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod5_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod5_head_dims0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod5_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod5_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod5_head_dims1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod5_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod6_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod6_head_dims0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod6_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod6_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod6_head_dims1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod6_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod7_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod7_head_dims0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod7_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod7_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod7_head_dims1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod7_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_pow_2_headdim_head_dim_121_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_pow_2_headdim_head_dim_17_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_pow_2_headdim_head_dim_24_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_pow_2_headdim_head_dim_94_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_num_warps_8_error_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_padded_dense_causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_qkv_and_block_mask_on_the_same_device_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_recompile_changed_score_mod_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_reduction_unrolled_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_selective_ac_ops_to_save0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_selective_ac_ops_to_save1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_selective_ac_ops_to_save2_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_seq_masking_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_silu_on_score_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_skip_odd_keys_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_skip_odd_keys_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_skip_odd_keys_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_small_block_mask_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_small_q_kv_len_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_backwards_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s0_v_s0_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s0_v_s0_do_s1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s0_v_s0_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s1_v_s1_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s1_v_s1_do_s1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s1_v_s1_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s2_v_s2_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s2_v_s2_do_s1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s2_v_s2_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s3_v_s3_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s3_v_s3_do_s1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s3_v_s3_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s0_v_s0_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s0_v_s0_do_s1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s0_v_s0_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s1_v_s1_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s1_v_s1_do_s1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s1_v_s1_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s2_v_s2_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s2_v_s2_do_s1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s2_v_s2_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s3_v_s3_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s3_v_s3_do_s1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s3_v_s3_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_subgraph_respect_decompostion_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_symbol_closure_in_score_mod_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_tensor_subclass_dispatch_order_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_tma_with_customer_kernel_options_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_triton_template_warp_specialization_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_validate_small_embedding_size_error_message_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_zero_length_sequence_error_cuda, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_allocate_cuda, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_convert_logical_block_mask_cuda, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_convert_mask_mod_cuda, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_page_allocation_cuda, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod0_cuda_float32, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod1_cuda_float32, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod2_cuda_float32, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod3_cuda_float32, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod4_cuda_bfloat16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod4_cuda_float32, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod5_cuda_bfloat16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod5_cuda_float32, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod6_cuda_bfloat16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod6_cuda_float32, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod7_cuda_bfloat16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod7_cuda_float32, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_update_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_backward_error_with_none_q_indices_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_attributes_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_device_change_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_operations_with_none_q_indices_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_viz_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_vs_sequence_lengths_compile_False_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_vs_sequence_lengths_compile_True_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_size_changes_BLOCK_SIZE4_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_size_changes_BLOCK_SIZE5_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_size_changes_BLOCK_SIZE_128_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_size_changes_BLOCK_SIZE_256_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_size_changes_BLOCK_SIZE_32_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_size_changes_BLOCK_SIZE_64_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_size_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_compiling_create_block_mask_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_compiling_create_block_mask_no_recompile_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_create_is_cuda_graphable_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_doc_mask_clamped_repro_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_eager_tracing_correctness_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_forward_pass_with_none_q_indices_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_from_kv_blocks_full_indices_False_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_from_kv_blocks_full_indices_True_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_from_kv_blocks_without_q_computation_full_indices_False_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_from_kv_blocks_without_q_computation_full_indices_True_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_getitem_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_init_mismatched_full_kv_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_init_mismatched_full_q_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_upcast_appropriately_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_comparison_vs_sdpa_with_learnable_bias_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flex_attention_with_dynamic_max_autotune_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flex_attention_with_dynamic_max_autotune_graph_partition_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_bias_req_grad_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_indirect_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_indirect_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_indirect_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_indirect_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_indirect_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_indirect_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_indirect_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_indirect_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_indirect_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_inspect_bug_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda 2025-09-07T06:43:21.0624117Z 2025-09-07T06:43:21.0624335Z Running inductor/test_cutlass_backend 1/1 ... [2025-09-07 06:43:21.002471] 2025-09-07T06:43:21.0624736Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:43:21.0625648Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutlass_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:43:21.002803] 2025-09-07T06:43:28.0273755Z 2025-09-07T06:43:28.0275085Z inductor/test_cutlass_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutlass_backend_1.1_94104cfc715c46d2_.log 2025-09-07T06:43:28.0336003Z Running 152 items in this shard: test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_aoti_workspace_ptr, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_check_paths, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_compilation_time_use_aoti_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_compilation_time_use_aoti_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_config_number_post_filtering, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_backend_fp8_scaled_mm_fast_accum_filtering, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_backend_integration, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_backend_matmul_nonzero_offset, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_backend_matmul_same_tensor, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_backend_op_allowlist, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_backend_op_denylist, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_backend_shape_coverage_mm, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_backend_subproc_addmm_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_backend_subproc_addmm_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_backend_subproc_bmm, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_backend_subproc_mm, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_key, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_presets_presets_, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_presets_presets_0, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_cutlass_presets_presets_0,999, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_diff_matmul_share_same_kernel_dynamic_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_diff_matmul_share_same_kernel_dynamic_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_broadcasting_add, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_broadcasting_div, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_broadcasting_mul, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_broadcasting_sub, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_flexible_layout, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_add_shape0, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_add_shape1, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_add_shape2, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_add_shape3, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_div_shape0, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_div_shape1, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_div_shape2, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_div_shape3, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_mul_shape0, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_mul_shape1, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_mul_shape2, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_mul_shape3, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_relu_shape0, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_relu_shape1, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_relu_shape2, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_relu_shape3, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_sub_shape0, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_sub_shape1, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_sub_shape2, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_fusions_basic_sub_shape3, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_mixed_dtypes_add, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_mixed_dtypes_div, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_mixed_dtypes_mul, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_mixed_dtypes_relu, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_mixed_dtypes_sub, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_op_add, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_op_div, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_op_mul, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_op_relu, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_op_sub, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_output_add_dynamic_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_output_add_dynamic_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_output_div_dynamic_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_output_div_dynamic_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_output_mul_dynamic_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_output_mul_dynamic_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_output_relu_dynamic_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_output_relu_dynamic_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_output_sub_dynamic_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_multi_output_sub_dynamic_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_return_accumulator, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_reuse_matmul_input_add, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_reuse_matmul_input_div, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_reuse_matmul_input_mul, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_reuse_matmul_input_relu, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_evt_reuse_matmul_input_sub, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_filtered_ops_cache, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_flexible_layout, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_force_cutlass_backend_aoti_cexpr_codegen, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_force_cutlass_backend_aoti_dynamic, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_float8_e4m3fn_shape0_has_bias_False_use_fast_accum_False_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_float8_e4m3fn_shape0_has_bias_False_use_fast_accum_False_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_float8_e4m3fn_shape0_has_bias_False_use_fast_accum_True_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_float8_e4m3fn_shape0_has_bias_False_use_fast_accum_True_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_float8_e4m3fn_shape0_has_bias_True_use_fast_accum_False_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_float8_e4m3fn_shape0_has_bias_True_use_fast_accum_False_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_float8_e4m3fn_shape0_has_bias_True_use_fast_accum_True_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_float8_e4m3fn_shape0_has_bias_True_use_fast_accum_True_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_multiple_linear_float8_e4m3fn_shape0_use_fast_accum_True_use_aoti_False_dynamic_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_multiple_linear_float8_e4m3fn_shape0_use_fast_accum_True_use_aoti_False_dynamic_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_multiple_linear_float8_e4m3fn_shape0_use_fast_accum_True_use_aoti_True_dynamic_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_rowwise_scaling_multiple_linear_float8_e4m3fn_shape0_use_fast_accum_True_use_aoti_True_dynamic_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_tensorwise_scaling_float8_e4m3fn_shape0_has_bias_False_use_fast_accum_False_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_tensorwise_scaling_float8_e4m3fn_shape0_has_bias_False_use_fast_accum_False_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_tensorwise_scaling_float8_e4m3fn_shape0_has_bias_True_use_fast_accum_False_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_fp8_tensorwise_scaling_float8_e4m3fn_shape0_has_bias_True_use_fast_accum_False_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_gemm_operation_serialization_arch_100_cuda_version_12_4, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_gemm_operation_serialization_arch_100_cuda_version_12_8, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_gemm_operation_serialization_arch_90_cuda_version_12_4, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_gemm_operation_serialization_arch_90_cuda_version_12_8, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_get_max_alignment, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_import_cutlass, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_addmm_dynamic_False_use_aoti_False_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_addmm_dynamic_False_use_aoti_False_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_addmm_dynamic_False_use_aoti_True_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_addmm_dynamic_False_use_aoti_True_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_addmm_dynamic_True_use_aoti_False_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_addmm_dynamic_True_use_aoti_False_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_addmm_dynamic_True_use_aoti_True_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_addmm_dynamic_True_use_aoti_True_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_False_use_aoti_False_bfloat16_use_expand_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_False_use_aoti_False_bfloat16_use_expand_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_False_use_aoti_False_float16_use_expand_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_False_use_aoti_False_float16_use_expand_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_False_use_aoti_True_bfloat16_use_expand_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_False_use_aoti_True_bfloat16_use_expand_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_False_use_aoti_True_float16_use_expand_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_False_use_aoti_True_float16_use_expand_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_True_use_aoti_False_bfloat16_use_expand_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_True_use_aoti_False_bfloat16_use_expand_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_True_use_aoti_False_float16_use_expand_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_True_use_aoti_False_float16_use_expand_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_True_use_aoti_True_bfloat16_use_expand_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_True_use_aoti_True_bfloat16_use_expand_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_True_use_aoti_True_float16_use_expand_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_bmm_dynamic_True_use_aoti_True_float16_use_expand_True, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_chained_fusion_fp16_fp32acc, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_fp8_scaled_mm_dynamic_False_use_aoti_False_float8_e4m3fn, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_fp8_scaled_mm_dynamic_False_use_aoti_True_float8_e4m3fn, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_fp8_scaled_mm_dynamic_True_use_aoti_False_float8_e4m3fn, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_fp8_scaled_mm_dynamic_True_use_aoti_True_float8_e4m3fn, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_int_mm_dynamic_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_no_fusion_dtype_mismatch, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_regular_mm_dynamic_False_use_aoti_False_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_regular_mm_dynamic_False_use_aoti_False_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_regular_mm_dynamic_False_use_aoti_True_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_regular_mm_dynamic_False_use_aoti_True_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_regular_mm_dynamic_True_use_aoti_False_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_regular_mm_dynamic_True_use_aoti_False_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_regular_mm_dynamic_True_use_aoti_True_bfloat16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_regular_mm_dynamic_True_use_aoti_True_float16, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_regular_mm_streamk, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_relu6_fusion_fp16_fp32acc, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_relu_fusion_fp16_fp32acc, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_shape_dependent_normalization_fusion, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_simple_fusion_fp16_fp32acc, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_sparse_semi_structured_mm_dynamic_False, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_threshold, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_maybe_append_choice_caching, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_multiple_mm, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_multiple_mm_with_dynamic_shape, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_number_mm_precompiles, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_standalone_runner, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_streamk_with_dynamic, test/inductor/test_cutlass_backend.py::TestCutlassBackend::test_streamk_with_static 2025-09-07T06:43:28.0391447Z 2025-09-07T06:43:28.0391618Z Running test_cpp_api_parity 1/1 ... [2025-09-07 06:43:28.027971] 2025-09-07T06:43:28.0391966Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:43:28.0392813Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_api_parity.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:43:28.028295] 2025-09-07T06:44:24.8199695Z 2025-09-07T06:44:24.8200914Z test_cpp_api_parity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_api_parity_1.1_8cf94b9dd3a2b69f_.log 2025-09-07T06:44:24.8364331Z Running 488 items in this shard: test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special_cuda 2025-09-07T06:44:24.8510130Z 2025-09-07T06:44:24.8510288Z Running test_fx 1/1 ... [2025-09-07 06:44:24.821024] 2025-09-07T06:44:24.8510612Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:44:24.8511431Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:44:24.821355] 2025-09-07T06:47:56.8369641Z 2025-09-07T06:47:56.8370753Z test_fx 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_1.1_4dc224d98a1e79ad_.log 2025-09-07T06:47:56.8750471Z Running 1269 items in this shard: test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationMetadata_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationMetadata_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationTorchTensorCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationTorchTensorCall_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_Mutation_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_Mutation_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_ReturnList_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_ReturnList_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_TakeList_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_TakeList_cuda, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cuda, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_MutationFactory_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_MutationFactory_cuda, test/test_fx.py::TestCSEPass::test_banned_list, test/test_fx.py::TestCSEPass::test_empty, test/test_fx.py::TestCSEPass::test_immutable_list_multiple_entries, test/test_fx.py::TestCSEPass::test_immutable_list_type, test/test_fx.py::TestCSEPass::test_kwarg, test/test_fx.py::TestCSEPass::test_nested_immutable_list_type, test/test_fx.py::TestCSEPass::test_nochange, test/test_fx.py::TestCSEPass::test_rand_like, test/test_fx.py::TestCSEPass::test_rand_n, test/test_fx.py::TestCSEPass::test_random, test/test_fx.py::TestCSEPass::test_simple, test/test_fx.py::TestCSEPass::test_simple_2, test/test_fx.py::TestCSEPass::test_simple_multiple_same_ops, test/test_fx.py::TestCSEPass::test_two_args, test/test_fx.py::TestCSEPass::test_two_args_default, test/test_fx.py::TestDCE::test_dead_chain, test/test_fx.py::TestDCE::test_dead_getattr, test/test_fx.py::TestDCE::test_dead_placeholder, test/test_fx.py::TestDCE::test_dead_placeholder_with_user, test/test_fx.py::TestDCE::test_impure_custom, test/test_fx.py::TestDCE::test_impure_kwargs, test/test_fx.py::TestDCE::test_impure_nodes_args, test/test_fx.py::TestDCE::test_impure_random, test/test_fx.py::TestDCE::test_keep_collectives, test/test_fx.py::TestDCE::test_keep_collectives_no_overload, test/test_fx.py::TestDCE::test_keep_module_with_side_effects, test/test_fx.py::TestDCE::test_keep_setitem, test/test_fx.py::TestDCE::test_keep_torch_assert, test/test_fx.py::TestDCE::test_simple, test/test_fx.py::TestConstFold::test_check_inline_non_const, test/test_fx.py::TestConstFold::test_check_inline_non_const_mult_return, test/test_fx.py::TestConstFold::test_check_skip_folding_quant_dequant_pattern, test/test_fx.py::TestConstFold::test_const_fold_basic_one_attr_name_collision, test/test_fx.py::TestConstFold::test_const_fold_basic_one_attr_no_name_collision, test/test_fx.py::TestConstFold::test_const_fold_basic_placeholder_reordered, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr_three_input, test/test_fx.py::TestConstFold::test_const_fold_has_inlined_call_module_node, test/test_fx.py::TestConstFold::test_const_fold_module_attr, test/test_fx.py::TestConstFold::test_const_fold_multi_const_folded_attrs, test/test_fx.py::TestConstFold::test_const_fold_noop, test/test_fx.py::TestConstFold::test_const_fold_submod_hierarchy, test/test_fx.py::TestConstFold::test_const_fold_tensor_meta, test/test_fx.py::TestConstFold::test_const_fold_unused_placeholder, test/test_fx.py::TestConstFold::test_dict_output, test/test_fx.py::TestConstFold::test_fold_module, test/test_fx.py::TestConstFold::test_retain_node_meta, test/test_fx.py::TestConstFold::test_three_outputs, test/test_fx.py::TestConstFold::test_two_outputs, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_dim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_ndim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_nelement_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_numel_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_shape_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_size_const, test/test_fx.py::AnnotationsTest::test_annotate, test/test_fx.py::AnnotationsTest::test_annotations, test/test_fx.py::AnnotationsTest::test_broadcasting1, test/test_fx.py::AnnotationsTest::test_broadcasting2, test/test_fx.py::AnnotationsTest::test_broadcasting3, test/test_fx.py::AnnotationsTest::test_consistency, test/test_fx.py::AnnotationsTest::test_precision, test/test_fx.py::TypeCheckerTest::test_flatten_fully_static, test/test_fx.py::TypeCheckerTest::test_resnet50, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast_2, test/test_fx.py::TypeCheckerTest::test_type_check_add_false, test/test_fx.py::TypeCheckerTest::test_type_check_add_true, test/test_fx.py::TypeCheckerTest::test_type_check_add_with_broadcast, test/test_fx.py::TypeCheckerTest::test_type_check_add_with_scalar, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_broadcast, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_false, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_symbolic, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_2, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_2_fully_static, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_types, test/test_fx.py::TypeCheckerTest::test_type_check_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_flatten3, test/test_fx.py::TypeCheckerTest::test_type_check_flatten_2, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_true, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_true_param_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_true, test/test_fx.py::TypeCheckerTest::test_type_check_symbolic_inferenceconv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_transpose_False, test/test_fx.py::TypeCheckerTest::test_type_check_transpose_true, test/test_fx.py::TypeCheckerTest::test_type_maxpool2d_fully_static, test/test_fx.py::TypeCheckerTest::test_type_typechecl_maxpool2d_3dinput, test/test_fx.py::TypeCheckerTest::test_typecheck_basicblock, test/test_fx.py::TestMatcher::test_matcher_with_name_node_map_function, test/test_fx.py::TestMatcher::test_matcher_with_name_node_map_module, test/test_fx.py::TestMatcher::test_split_to_graph_and_name_node_map, test/test_fx.py::TestMatcher::test_subgraph_matcher_ignore_literals, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_attributes, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list_bad, test/test_fx.py::TestMatcher::test_variatic_arg_matching, test/test_fx.py::TestPassManager::test_pass_manager, test/test_fx.py::TestPassManager::test_pass_manager_bad_checks, test/test_fx.py::TestPassManager::test_pass_manager_checks, test/test_fx.py::TestPassManager::test_pass_manager_error, test/test_fx.py::TestPassManager::test_this_before_that_pass_constraint, test/test_fx.py::TestPassManager::test_topological_sort, test/test_fx.py::TestSourceMatcher::test_legalize_slice, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_True, test/test_fx.py::TestSubgraphRewriter::test_matching_pattern_with_list_type_arg, test/test_fx.py::TestSubgraphRewriter::test_matching_variable_arguments, test/test_fx.py::TestSubgraphRewriter::test_replace_pattern_with_callback, test/test_fx.py::TestSubgraphRewriter::test_replace_pattern_with_filters, test/test_fx.py::TestSubgraphRewriter::test_replaced_nodes, test/test_fx.py::TestSubgraphRewriter::test_replacement_with_attrs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_annotations_int, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_call_method, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_correct_output_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_graph_argument_order, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_internal_pattern_nodes_cannot_have_users_that_are_not_matched, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_local_revert, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_multiple_pattern_match, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_nodes_with_kwargs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_is_entire_graph, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_output_pattern_node_can_have_users_that_are_not_matched, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_placeholder_matching, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_preserves_logic, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_consecutive_submodules, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_with_duplicated_outputs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_with_multiple_outputs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replaces_referenced_submodules, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_single_pattern_match, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_traced_as_callable, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_oneliner_pattern, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_overlapping_matches, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_trivial_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_args, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_results, test/test_fx.py::TestFX::test_all_input_nodes, test/test_fx.py::TestFX::test_annotation_with_future, test/test_fx.py::TestFX::test_annotations_empty_tuple, test/test_fx.py::TestFX::test_annotations_with_forward_references, test/test_fx.py::TestFX::test_annotations_with_no_forward_references, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_internal_forward_references, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_no_internal_forward_references, test/test_fx.py::TestFX::test_args_kwargs, test/test_fx.py::TestFX::test_args_kwargs_no_self, test/test_fx.py::TestFX::test_assert, test/test_fx.py::TestFX::test_ast_rewriter_reassigns_submodules, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert_with_message, test/test_fx.py::TestFX::test_ast_rewriter_wrap, test/test_fx.py::TestFX::test_ast_rewriter_wrap_fn_directly, test/test_fx.py::TestFX::test_ast_rewriter_wrap_with_submodule, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFX::test_autowrap_functions, test/test_fx.py::TestFX::test_concrete_arg_none_assert, test/test_fx.py::TestFX::test_construct_root_dict, test/test_fx.py::TestFX::test_control_flow_tracing, test/test_fx.py::TestFX::test_copy_it, test/test_fx.py::TestFX::test_copy_no_remap, test/test_fx.py::TestFX::test_ctx_mgr, test/test_fx.py::TestFX::test_custom_codegen, test/test_fx.py::TestFX::test_custom_codegen_with_transformer, test/test_fx.py::TestFX::test_custom_import, test/test_fx.py::TestFX::test_custom_proxy_dynamic_value, test/test_fx.py::TestFX::test_custom_proxy_input_dependent_control_flow, test/test_fx.py::TestFX::test_custom_proxy_type, test/test_fx.py::TestFX::test_custom_proxy_type_literal, test/test_fx.py::TestFX::test_custom_traceback_not_raised_when_exception_source_is_submodule, test/test_fx.py::TestFX::test_custom_traceback_raised_when_exception_source_is_graphmodule, test/test_fx.py::TestFX::test_deepcopy_graph_with_tracer_cls, test/test_fx.py::TestFX::test_deepcopy_graphmodule, test/test_fx.py::TestFX::test_deepcopy_graphmodule_with_transform, test/test_fx.py::TestFX::test_deepcopy_no_recursion, test/test_fx.py::TestFX::test_deepcopy_recursion_depth, test/test_fx.py::TestFX::test_deepcopy_tracer, test/test_fx.py::TestFX::test_deepcopy_with_submods_params, test/test_fx.py::TestFX::test_delete_unused_submodules_leaf, test/test_fx.py::TestFX::test_delete_unused_values, test/test_fx.py::TestFX::test_dict, test/test_fx.py::TestFX::test_direct_param_use, test/test_fx.py::TestFX::test_disallow_override, test/test_fx.py::TestFX::test_ellipsis, test/test_fx.py::TestFX::test_empty_graph_codegen, test/test_fx.py::TestFX::test_enum, test/test_fx.py::TestFX::test_erase_node_error, test/test_fx.py::TestFX::test_example_shape_prop, test/test_fx.py::TestFX::test_find_uses, test/test_fx.py::TestFX::test_fn_type_annotation_empty, test/test_fx.py::TestFX::test_fn_type_annotations, test/test_fx.py::TestFX::test_fx_and_or, test/test_fx.py::TestFX::test_fx_create_arg, test/test_fx.py::TestFX::test_fx_shifts, test/test_fx.py::TestFX::test_fx_stateless, test/test_fx.py::TestFX::test_get_torch_func_signature, test/test_fx.py::TestFX::test_getitem, test/test_fx.py::TestFX::test_getitem_subproc, test/test_fx.py::TestFX::test_graph_edit_with_proxy, test/test_fx.py::TestFX::test_graph_fns, test/test_fx.py::TestFX::test_graph_module, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_dict_init, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_mod_init, test/test_fx.py::TestFX::test_graph_module_replicate_for_dp, test/test_fx.py::TestFX::test_graph_unique_names, test/test_fx.py::TestFX::test_graph_unique_names_manual, test/test_fx.py::TestFX::test_immutable_dict_pytree_ops, test/test_fx.py::TestFX::test_immutable_list_pytree_ops, test/test_fx.py::TestFX::test_imul_code_print, test/test_fx.py::TestFX::test_inf_nan, test/test_fx.py::TestFX::test_inf_nan_kwds, test/test_fx.py::TestFX::test_informative_co_filename, test/test_fx.py::TestFX::test_inline_graph, test/test_fx.py::TestFX::test_insert_arg, test/test_fx.py::TestFX::test_insertion_point, test/test_fx.py::TestFX::test_interpreter, test/test_fx.py::TestFX::test_interpreter_default_args, test/test_fx.py::TestFX::test_interpreter_gc_values, test/test_fx.py::TestFX::test_interpreter_noop_resnet18, test/test_fx.py::TestFX::test_interpreter_not_enough_args, test/test_fx.py::TestFX::test_interpreter_onthefly_swap, test/test_fx.py::TestFX::test_interpreter_other_graph, test/test_fx.py::TestFX::test_interpreter_partial_eval, test/test_fx.py::TestFX::test_interpreter_run_node_override, test/test_fx.py::TestFX::test_interpreter_star_args, test/test_fx.py::TestFX::test_interpreter_with_codegen, test/test_fx.py::TestFX::test_layout, test/test_fx.py::TestFX::test_leaf_module, test/test_fx.py::TestFX::test_lineno_map, test/test_fx.py::TestFX::test_matmul_tracing, test/test_fx.py::TestFX::test_metadata_on_ph, test/test_fx.py::TestFX::test_module_deepcopy_edit_nodes, test/test_fx.py::TestFX::test_move_before, test/test_fx.py::TestFX::test_multi_insert_point, test/test_fx.py::TestFX::test_multiple_default_args, test/test_fx.py::TestFX::test_named_tuple_inlined, test/test_fx.py::TestFX::test_namedtuple_return_qualname, test/test_fx.py::TestFX::test_namedtuple_return_trace, test/test_fx.py::TestFX::test_native_callable, test/test_fx.py::TestFX::test_nn_module_stack, test/test_fx.py::TestFX::test_no_mutation, test/test_fx.py::TestFX::test_node_tagging, test/test_fx.py::TestFX::test_nonetype_annotation, test/test_fx.py::TestFX::test_partial_trace, test/test_fx.py::TestFX::test_pickle_custom_import, test/test_fx.py::TestFX::test_pickle_graphmodule, test/test_fx.py::TestFX::test_pickle_nonetype_annotation, test/test_fx.py::TestFX::test_pickle_torch_custom_ops, test/test_fx.py::TestFX::test_prepend_self, test/test_fx.py::TestFX::test_pretty_print, test/test_fx.py::TestFX::test_pretty_print_graph, test/test_fx.py::TestFX::test_pretty_print_node, test/test_fx.py::TestFX::test_pretty_print_targets, test/test_fx.py::TestFX::test_print_graph, test/test_fx.py::TestFX::test_profiler_ranges_side_effect, test/test_fx.py::TestFX::test_proxy_deepcopy_with_tracer, test/test_fx.py::TestFX::test_proxy_deepcopy_without_tracer, test/test_fx.py::TestFX::test_pytree, test/test_fx.py::TestFX::test_pytree_concrete, test/test_fx.py::TestFX::test_reassign_args_kwargs_uses, test/test_fx.py::TestFX::test_regular_and_default_args, test/test_fx.py::TestFX::test_remove_uses, test/test_fx.py::TestFX::test_remove_uses_with_custom_filter, test/test_fx.py::TestFX::test_replace_input, test/test_fx.py::TestFX::test_replace_uses, test/test_fx.py::TestFX::test_reserved_getattr, test/test_fx.py::TestFX::test_return_tuple, test/test_fx.py::TestFX::test_return_type_exists, test/test_fx.py::TestFX::test_return_type_exists_pre_pep585, test/test_fx.py::TestFX::test_script_method_trace, test/test_fx.py::TestFX::test_script_tensor_constant, test/test_fx.py::TestFX::test_sequential, test/test_fx.py::TestFX::test_shape_prop_aggregate, test/test_fx.py::TestFX::test_shape_prop_layout, test/test_fx.py::TestFX::test_shape_prop_layout_3d, test/test_fx.py::TestFX::test_shape_prop_unbacked_sym, test/test_fx.py::TestFX::test_single_default_arg, test/test_fx.py::TestFX::test_snake_case, test/test_fx.py::TestFX::test_sqrt, test/test_fx.py::TestFX::test_stack_traces, test/test_fx.py::TestFX::test_stack_traces_with_transformer, test/test_fx.py::TestFX::test_string_literal_return, test/test_fx.py::TestFX::test_submodule_manipulation_API, test/test_fx.py::TestFX::test_symbolic_trace_assert, test/test_fx.py::TestFX::test_symbolic_trace_sequential, test/test_fx.py::TestFX::test_tensor_attribute, test/test_fx.py::TestFX::test_tensor_attribute_coalseced, test/test_fx.py::TestFX::test_tensor_constant, test/test_fx.py::TestFX::test_throw_out_variant, test/test_fx.py::TestFX::test_torch_custom_ops, test/test_fx.py::TestFX::test_torch_fx_getattr, test/test_fx.py::TestFX::test_torch_fx_len, test/test_fx.py::TestFX::test_torch_op_overloads, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx_tensor_arg, test/test_fx.py::TestFX::test_trace_buffer_slice, test/test_fx.py::TestFX::test_trace_dict_int_keys, test/test_fx.py::TestFX::test_trace_dict_proxy_keys, test/test_fx.py::TestFX::test_trace_fn_constant, test/test_fx.py::TestFX::test_trace_function, test/test_fx.py::TestFX::test_trace_multiple_funcs, test/test_fx.py::TestFX::test_trace_return_dataclass, test/test_fx.py::TestFX::test_trace_return_dataclass_nested, test/test_fx.py::TestFX::test_trace_return_namedtuple, test/test_fx.py::TestFX::test_tracing_graphmodules_as_leaf_submodules, test/test_fx.py::TestFX::test_transformer_multi_outputs, test/test_fx.py::TestFX::test_transformer_noop, test/test_fx.py::TestFX::test_transformer_op_swap, test/test_fx.py::TestFX::test_transformer_preserves_nn_module_stack_for_get_attr, test/test_fx.py::TestFX::test_tuple_no_subscript, test/test_fx.py::TestFX::test_typename_print, test/test_fx.py::TestFX::test_typename_print_pre_pep585, test/test_fx.py::TestFX::test_unpack, test/test_fx.py::TestFX::test_unpack_dict_better_error, test/test_fx.py::TestFX::test_unpack_list_better_error, test/test_fx.py::TestFX::test_update_args_api, test/test_fx.py::TestFX::test_update_args_kwargs_yells_at_you, test/test_fx.py::TestFX::test_update_kwargs_api, test/test_fx.py::TestFX::test_user_friendly_call_provenance_with_function, test/test_fx.py::TestFX::test_user_friendly_call_provenance_with_module, test/test_fx.py::TestFX::test_varargs_concrete, test/test_fx.py::TestFX::test_wrap, test/test_fx.py::TestFX::test_wrap_decorated_function, test/test_fx.py::TestFX::test_wrap_fn_directly, test/test_fx.py::TestFX::test_wrap_with_submodule, test/test_fx.py::TestFX::test_wrapped_method, test/test_fx.py::TestFX::test_wrapped_retrace, test/test_fx.py::TestFX::test_wrapped_via_decorator, test/test_fx.py::TestFX::test_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFX::test_wrong_target_type, test/test_fx.py::TestFX::test_wrong_topo, test/test_fx.py::TestFXAPIBackwardCompatibility::test_adding_side_effect_function, test/test_fx.py::TestFXAPIBackwardCompatibility::test_class_member_back_compat, test/test_fx.py::TestFXAPIBackwardCompatibility::test_function_back_compat, test/test_fx.py::TestFXAPIBackwardCompatibility::test_preserve_unused_attr_after_unpickle, test/test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool1d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_affine_grid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_batch_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_bilinear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_binary_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_binary_cross_entropy_with_logits, test/test_fx.py::TestFunctionalTracing::test_nn_functional_celu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_celu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_channel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_tbc, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_similarity, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_ctc_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_elu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_elu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_embedding, test/test_fx.py::TestFunctionalTracing::test_nn_functional_embedding_bag, test/test_fx.py::TestFunctionalTracing::test_nn_functional_feature_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gaussian_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_glu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_grid_sample, test/test_fx.py::TestFunctionalTracing::test_nn_functional_group_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gumbel_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardswish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardtanh, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardtanh_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hinge_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_huber_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_instance_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_interpolate, test/test_fx.py::TestFunctionalTracing::test_nn_functional_kl_div, test/test_fx.py::TestFunctionalTracing::test_nn_functional_l1_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_layer_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_leaky_relu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_leaky_relu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_linear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_local_response_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_log_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_logsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_margin_ranking_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_mish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_mse_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_head_attention_forward, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_native_channel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_normalize, test/test_fx.py::TestFunctionalTracing::test_nn_functional_one_hot, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pad, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pairwise_distance, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pdist, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_unshuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_poisson_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_prelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu6, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rms_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_dot_product_attention, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_silu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_smooth_l1_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmin, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softplus, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_with_distance_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_unfold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample_bilinear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample_nearest, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_H_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_T_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___getitem___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___radd___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rdiv___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmatmul___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmod___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmul___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rpow___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rsub___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__batch_norm_with_update_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__chunk_cat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__native_batch_norm_legit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_lengths_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_offsets_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__softmax_backward_data_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__upsample_bilinear2d_aa_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_abs_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addbmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcdiv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmm_decomposed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_alias_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_all_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_allclose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_aminmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_angle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_any_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_arange_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argsort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argwhere_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_partial_views_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_asin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_asinh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atan2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_baddbmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bernoulli_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bfloat16_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_block_diag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bool_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_shapes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_to_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bucketize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_byte_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cartesian_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cauchy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cdist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cdouble_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ceil_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cfloat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chalf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_char_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_inverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chunk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_max_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_min_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clone_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_column_stack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_combinations_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_complex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_physical_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_constant_pad_nd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_contiguous_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_copysign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_corrcoef_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_count_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cov_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cross_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cummax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cummin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumulative_trapezoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_deg2rad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diag_embed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagflat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diff_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_digamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_floor_rounding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_no_rounding_mode_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_trunc_rounding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_double_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_einsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_permuted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_equal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erfc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erfinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expm1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eye_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fftshift_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftshift_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flip_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fliplr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flipud_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_power_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_frac_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_frexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_full_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_full_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gather_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ge_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geometric_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geqrf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gradient_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_grid_sampler_2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_grid_sampler_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_half_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hash_tensor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_heaviside_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_histc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hypot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_i0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_igamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_igammac_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_inner_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_int_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isclose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isfinite_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isnan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isneginf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isposinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isreal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_item_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_2inputs_2outputs_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_4inputs_with_extra_args_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_return_by_ref_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_unary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kron_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kthvalue_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ldexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_le_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lerp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lgamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cholesky_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cholesky_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cond_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cross_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_det_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_diagonal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eig_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvals_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvalsh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_householder_product_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_factor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_grad_oriented_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_power_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_hermitian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_multi_dot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_hermitian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_singular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_qr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_slogdet_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_triangular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_svdvals_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorsolve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vander_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vecdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vector_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_tensor_overload_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log10_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log1p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_normal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_softmax_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logcumsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logdet_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_and_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_not_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_or_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_xor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logspace_tensor_overload_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_long_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_unpack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mH_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mT_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_argmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_median_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matrix_exp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_pool2d_with_indices_backward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_reduction_no_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_maximum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_median_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_list_of_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_variadic_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_no_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_minimum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mode_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_movedim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_msort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_multinomial_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nan_to_num_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmedian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanquantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nansum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_dropout_backward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ne_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_neg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_full_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_ones_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_zeros_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nextafter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_alpha_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_binary_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_celu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_channel_shuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cosine_similarity_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_ctc_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_elu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_bag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_fractional_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_gaussian_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_gelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_glu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_grid_sample_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_group_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardsigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardswish_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardtanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hinge_embedding_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_huber_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_instance_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_area_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_bicubic_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_linear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_trilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_kl_div_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_l1_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_leaky_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_linear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_local_response_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_logsigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_margin_ranking_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool1d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool2d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_mish_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_mse_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multi_head_attention_forward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multi_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multilabel_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_circular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_constant_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_reflect_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_replicate_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_replicate_negative_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pairwise_distance_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pdist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pixel_shuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pixel_unshuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_poisson_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_prelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu6_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rms_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rrelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_selu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_silu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_smooth_l1_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softmin_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softplus_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softsign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_tanhshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_threshold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_unfold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_upsample_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_upsample_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_static_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_fro_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_inf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_nuc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_in_place_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_number_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ones_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ones_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ormqr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_outer_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pca_lowrank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_permute_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_permute_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pinverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polar_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_4_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_positive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pow_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_qr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_quantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rad2deg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rand_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randint_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randint_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randn_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ravel_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_real_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reciprocal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_remainder_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_renorm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_repeat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_repeat_interleave_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reshape_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reshape_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resize__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resize_as__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resolve_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resolve_neg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_roll_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rot90_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_neg_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rsqrt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rsub_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scalar_tensor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_searchsorted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_select_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sgn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_short_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_bartlett_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_blackman_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_gaussian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_general_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_general_hamming_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hamming_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hann_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_kaiser_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_nuttall_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signbit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sinc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sinh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_slice_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_slice_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_softmax_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sparse_mm_reduce_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sparse_sampled_addmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_airy_ai_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_j0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_j1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_v_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_w_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_entr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_erfcx_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_h_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_he_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i0e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i1e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_laguerre_polynomial_l_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_legendre_polynomial_p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_log_ndtr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_ndtr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_ndtri_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_scaled_modified_bessel_k0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_scaled_modified_bessel_k1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_spherical_bessel_j0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_xlog1py_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_zeta_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_list_args_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_with_sizes_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_with_sizes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sqrt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_square_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_multiple_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_mean_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sub_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_to_size_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_svd_lowrank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_t_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_along_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tensor_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tensordot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_sparse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_topk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trapezoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trapz_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triangular_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tril_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_true_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trunc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unflatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unfold_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unfold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_uniform_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_consecutive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsafe_chunk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsafe_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_mean_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_as_complex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_where_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_xlogy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zero__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zeros_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zeros_like_cuda_float32, test/test_fx.py::TestVisionTracing::test_torchvision_models_alexnet, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_base, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_tiny, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet121, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet161, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet169, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet201, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_320_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fcos_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_keypointrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_maskrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_maskrcnn_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_ssd300_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_ssdlite320_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b0, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b1, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b2, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b3, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b4, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b5, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b6, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b7, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_l, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_m, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_googlenet, test/test_fx.py::TestVisionTracing::test_torchvision_models_inception_v3, test/test_fx.py::TestVisionTracing::test_torchvision_models_maxvit_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet0_75, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet1_3, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_16gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_1_6gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_3_2gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_400mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_8gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_128gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_16gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_1_6gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_3_2gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_400mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_8gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet152, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet18, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet34, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_32x8d, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_64x4d, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext50_32x4d, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_fcn_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_fcn_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_lraspp_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x1_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x2_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_squeezenet1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_squeezenet1_1, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg16_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg19, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg19_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mc3_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v1_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_r2plus1d_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_r3d_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_s3d, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_h_14, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet101_2, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet50_2 2025-09-07T06:47:56.9117412Z 2025-09-07T06:47:56.9117627Z Running test_transformers_privateuse1 1/1 ... [2025-09-07 06:47:56.838968] 2025-09-07T06:47:57.1746338Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions/open_registration_extension/torch_openreg 2025-09-07T06:47:57.4223757Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-09-07T06:47:57.4254505Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch_openreg==0.0.1) (2.9.0a0+git93fb23d) 2025-09-07T06:47:57.4280044Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (3.19.1) 2025-09-07T06:47:57.4285264Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (4.15.0) 2025-09-07T06:47:57.4289193Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (1.13.3) 2025-09-07T06:47:57.4293494Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (2.8.8) 2025-09-07T06:47:57.4296947Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (3.1.6) 2025-09-07T06:47:57.4301366Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (2025.7.0) 2025-09-07T06:47:57.4673421Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->torch_openreg==0.0.1) (1.3.0) 2025-09-07T06:47:57.4707150Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->torch_openreg==0.0.1) (3.0.2) 2025-09-07T06:47:57.4716185Z Building wheels for collected packages: torch_openreg 2025-09-07T06:48:27.7930983Z Building wheel for torch_openreg (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | done 2025-09-07T06:48:27.7943964Z [?25h Created wheel for torch_openreg: filename=torch_openreg-0.0.1-cp310-cp310-linux_x86_64.whl size=286812 sha256=5310e0ff8dbeb0be7d7e2ddd5ce3dfd38e2c2fe717ef8db38c91ea0108e622f0 2025-09-07T06:48:27.7947496Z Stored in directory: /tmp/pip-ephem-wheel-cache-fyubwpjm/wheels/9a/72/40/6361c42b6b8152b0dcc9bb50011cea06828e2866d520b81a3e 2025-09-07T06:48:27.7966141Z Successfully built torch_openreg 2025-09-07T06:48:28.1017267Z Installing collected packages: torch_openreg 2025-09-07T06:48:28.1150371Z Successfully installed torch_openreg-0.0.1 2025-09-07T06:48:28.1604620Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:48:28.1610016Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_transformers_privateuse1.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:48:28.160735] 2025-09-07T06:48:31.8307680Z 2025-09-07T06:48:31.8308805Z test_transformers_privateuse1 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_transformers_privateuse1_1.1_dfaf21e0de8c9cf4_.log 2025-09-07T06:48:31.8311293Z Running 3 items in this shard: test/test_transformers_privateuse1.py::TestSDPAPrivateUse1Only::test_fused_sdp_choice_privateuseone, test/test_transformers_privateuse1.py::TestSDPAPrivateUse1Only::test_scaled_dot_product_fused_attention_overrideable, test/test_transformers_privateuse1.py::TestSDPAPrivateUse1Only::test_scaled_dot_product_fused_attention_overrideable_backward 2025-09-07T06:48:31.8313129Z 2025-09-07T06:48:31.8313342Z Running test_openreg 1/1 ... [2025-09-07 06:48:31.831041] 2025-09-07T06:48:32.1772002Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions/open_registration_extension/torch_openreg 2025-09-07T06:48:32.4276781Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-09-07T06:48:32.4307109Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch_openreg==0.0.1) (2.9.0a0+git93fb23d) 2025-09-07T06:48:32.4332888Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (3.19.1) 2025-09-07T06:48:32.4337577Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (4.15.0) 2025-09-07T06:48:32.4342304Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (1.13.3) 2025-09-07T06:48:32.4345778Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (2.8.8) 2025-09-07T06:48:32.4349058Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (3.1.6) 2025-09-07T06:48:32.4353335Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->torch_openreg==0.0.1) (2025.7.0) 2025-09-07T06:48:32.4726450Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->torch_openreg==0.0.1) (1.3.0) 2025-09-07T06:48:32.4759835Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->torch_openreg==0.0.1) (3.0.2) 2025-09-07T06:48:32.4768936Z Building wheels for collected packages: torch_openreg 2025-09-07T06:48:40.5106061Z Building wheel for torch_openreg (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-09-07T06:48:40.5119004Z [?25h Created wheel for torch_openreg: filename=torch_openreg-0.0.1-cp310-cp310-linux_x86_64.whl size=286812 sha256=b3d06d8a412625d59570b008bca247d79cbf1a0229c62c557ffe950bd53f1f7a 2025-09-07T06:48:40.5120302Z Stored in directory: /tmp/pip-ephem-wheel-cache-pse6ml8a/wheels/9a/72/40/6361c42b6b8152b0dcc9bb50011cea06828e2866d520b81a3e 2025-09-07T06:48:40.5141579Z Successfully built torch_openreg 2025-09-07T06:48:40.8169872Z Installing collected packages: torch_openreg 2025-09-07T06:48:40.8309348Z Successfully installed torch_openreg-0.0.1 2025-09-07T06:48:40.8787853Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:48:40.8792473Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_openreg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:48:40.879003] 2025-09-07T06:48:44.6489951Z 2025-09-07T06:48:44.6491146Z test_openreg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_openreg_1.1_f667fca7ad88b1ee_.log 2025-09-07T06:48:44.6503948Z Running 44 items in this shard: test/test_openreg.py::TestPrivateUse1::test_backend_dispatchstub, test/test_openreg.py::TestPrivateUse1::test_backend_generate_methods, test/test_openreg.py::TestPrivateUse1::test_backend_module_function, test/test_openreg.py::TestPrivateUse1::test_backend_module_methods, test/test_openreg.py::TestPrivateUse1::test_backend_module_registration, test/test_openreg.py::TestPrivateUse1::test_backend_name, test/test_openreg.py::TestPrivateUse1::test_backend_operator_registration, test/test_openreg.py::TestPrivateUse1::test_backend_packed_sequence_methods, test/test_openreg.py::TestPrivateUse1::test_backend_storage_methods, test/test_openreg.py::TestPrivateUse1::test_backend_tensor_methods, test/test_openreg.py::TestPrivateUse1::test_backend_tensor_type, test/test_openreg.py::TestPrivateUse1::test_backend_type_methods, test/test_openreg.py::TestOpenReg::test_autograd_init, test/test_openreg.py::TestOpenReg::test_compile_autograd_function_aliasing, test/test_openreg.py::TestOpenReg::test_compile_autograd_function_returns_self, test/test_openreg.py::TestOpenReg::test_copy_same_device, test/test_openreg.py::TestOpenReg::test_cross_device_copy, test/test_openreg.py::TestOpenReg::test_cross_diff_devices_copy, test/test_openreg.py::TestOpenReg::test_data_dependent_output, test/test_openreg.py::TestOpenReg::test_event_elapsed_time, test/test_openreg.py::TestOpenReg::test_event_wait_stream, test/test_openreg.py::TestOpenReg::test_expand, test/test_openreg.py::TestOpenReg::test_factory, test/test_openreg.py::TestOpenReg::test_fake_tensor, test/test_openreg.py::TestOpenReg::test_generator, test/test_openreg.py::TestOpenReg::test_manual_seed, test/test_openreg.py::TestOpenReg::test_named_tensor, test/test_openreg.py::TestOpenReg::test_open_device_cpu_serialization, test/test_openreg.py::TestOpenReg::test_open_device_dlpack, test/test_openreg.py::TestOpenReg::test_open_device_numpy_serialization, test/test_openreg.py::TestOpenReg::test_pin_memory, test/test_openreg.py::TestOpenReg::test_printing, test/test_openreg.py::TestOpenReg::test_quantize, test/test_openreg.py::TestOpenReg::test_record_event, test/test_openreg.py::TestOpenReg::test_resize, test/test_openreg.py::TestOpenReg::test_rewrapped_storage, test/test_openreg.py::TestOpenReg::test_rng_state, test/test_openreg.py::TestOpenReg::test_scalar_type_fallback, test/test_openreg.py::TestOpenReg::test_serialization, test/test_openreg.py::TestOpenReg::test_stream_synchronize, test/test_openreg.py::TestOpenReg::test_stream_wait_event, test/test_openreg.py::TestOpenReg::test_stream_wait_stream, test/test_openreg.py::TestOpenReg::test_tensor_type_fallback, test/test_openreg.py::TestOpenReg::test_tensorlist_type_fallback 2025-09-07T06:48:44.6512815Z 2025-09-07T06:48:44.6513028Z Running inductor/test_benchmark_fusion 1/1 ... [2025-09-07 06:48:44.649335] 2025-09-07T06:48:44.6513427Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:48:44.6514361Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmark_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:48:44.649683] 2025-09-07T06:51:39.9062406Z 2025-09-07T06:51:39.9063841Z inductor/test_benchmark_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmark_fusion_1.1_d006d686d814d2a9_.log 2025-09-07T06:51:39.9073826Z Running 16 items in this shard: test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_avoid_register_spilling_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_foreach_kernel_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_register_spills_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_resnet18_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_softmax_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_tield_kernel_fusion_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkingTest::test_benchmark_on_non_zero_device, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionCudaTest::test_changed_layout, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionCudaTest::test_equivalent_extern_code, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionCudaTest::test_equivalent_template_code, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_avoid_register_spilling_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_foreach_kernel_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_register_spills_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_resnet18_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_softmax_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_tield_kernel_fusion_cpu 2025-09-07T06:51:39.9079381Z 2025-09-07T06:51:39.9079564Z Running test_show_pickle 1/1 ... [2025-09-07 06:51:39.906451] 2025-09-07T06:51:39.9079910Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:51:39.9080776Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_show_pickle.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:51:39.906770] 2025-09-07T06:51:43.6267813Z 2025-09-07T06:51:43.6268902Z test_show_pickle 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_show_pickle_1.1_1365a6e3650dec4f_.log 2025-09-07T06:51:43.6270287Z Running 1 items in this shard: test/test_show_pickle.py::TestShowPickle::test_scripted_model 2025-09-07T06:51:43.6271550Z 2025-09-07T06:51:43.6271891Z Running test_tensorexpr 1/1 ... [2025-09-07 06:51:43.626946] 2025-09-07T06:51:43.6272578Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:51:43.6274960Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_tensorexpr.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:51:43.627231] 2025-09-07T06:51:47.4974291Z 2025-09-07T06:51:47.4975393Z test_tensorexpr 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_tensorexpr_1.1_68be18d62c7aec01_.log 2025-09-07T06:51:47.4996577Z Running 74 items in this shard: test/test_tensorexpr.py::TestTensorExprFuser::test_add_const_rhs, test/test_tensorexpr.py::TestTensorExprFuser::test_add_sub, test/test_tensorexpr.py::TestTensorExprFuser::test_alias_analysis_input_and_module, test/test_tensorexpr.py::TestTensorExprFuser::test_alias_analysis_inputs, test/test_tensorexpr.py::TestTensorExprFuser::test_alias_analysis_module, test/test_tensorexpr.py::TestTensorExprFuser::test_all_combos, test/test_tensorexpr.py::TestTensorExprFuser::test_alpha, test/test_tensorexpr.py::TestTensorExprFuser::test_binary_ops, test/test_tensorexpr.py::TestTensorExprFuser::test_bitwise_ops, test/test_tensorexpr.py::TestTensorExprFuser::test_broadcast, test/test_tensorexpr.py::TestTensorExprFuser::test_broadcast3, test/test_tensorexpr.py::TestTensorExprFuser::test_broadcast_2, test/test_tensorexpr.py::TestTensorExprFuser::test_broadcast_big2, test/test_tensorexpr.py::TestTensorExprFuser::test_cat, test/test_tensorexpr.py::TestTensorExprFuser::test_cat_empty_tensors, test/test_tensorexpr.py::TestTensorExprFuser::test_cat_negative_dim, test/test_tensorexpr.py::TestTensorExprFuser::test_cat_only, test/test_tensorexpr.py::TestTensorExprFuser::test_cat_promote_inputs, test/test_tensorexpr.py::TestTensorExprFuser::test_cat_with_constant_dim, test/test_tensorexpr.py::TestTensorExprFuser::test_char, test/test_tensorexpr.py::TestTensorExprFuser::test_chunk, test/test_tensorexpr.py::TestTensorExprFuser::test_clamp, test/test_tensorexpr.py::TestTensorExprFuser::test_constant, test/test_tensorexpr.py::TestTensorExprFuser::test_double, test/test_tensorexpr.py::TestTensorExprFuser::test_double_intrinsics, test/test_tensorexpr.py::TestTensorExprFuser::test_dynamic_shape, test/test_tensorexpr.py::TestTensorExprFuser::test_easy, test/test_tensorexpr.py::TestTensorExprFuser::test_eq, test/test_tensorexpr.py::TestTensorExprFuser::test_exp_pow, test/test_tensorexpr.py::TestTensorExprFuser::test_four_arg, test/test_tensorexpr.py::TestTensorExprFuser::test_ge, test/test_tensorexpr.py::TestTensorExprFuser::test_gt, test/test_tensorexpr.py::TestTensorExprFuser::test_guard_fails, test/test_tensorexpr.py::TestTensorExprFuser::test_half_bn_relu, test/test_tensorexpr.py::TestTensorExprFuser::test_half_gelu, test/test_tensorexpr.py::TestTensorExprFuser::test_int64_promotion, test/test_tensorexpr.py::TestTensorExprFuser::test_int_output, test/test_tensorexpr.py::TestTensorExprFuser::test_le, test/test_tensorexpr.py::TestTensorExprFuser::test_loop, test/test_tensorexpr.py::TestTensorExprFuser::test_lt, test/test_tensorexpr.py::TestTensorExprFuser::test_mask, test/test_tensorexpr.py::TestTensorExprFuser::test_min_max, test/test_tensorexpr.py::TestTensorExprFuser::test_min_max_reduction, test/test_tensorexpr.py::TestTensorExprFuser::test_min_max_reduction2, test/test_tensorexpr.py::TestTensorExprFuser::test_min_max_reduction_dim1, test/test_tensorexpr.py::TestTensorExprFuser::test_min_max_reduction_dim1_2, test/test_tensorexpr.py::TestTensorExprFuser::test_multi_rand, test/test_tensorexpr.py::TestTensorExprFuser::test_multioutput, test/test_tensorexpr.py::TestTensorExprFuser::test_multiple_outputs, test/test_tensorexpr.py::TestTensorExprFuser::test_nans, test/test_tensorexpr.py::TestTensorExprFuser::test_ne, test/test_tensorexpr.py::TestTensorExprFuser::test_promotion, test/test_tensorexpr.py::TestTensorExprFuser::test_propagated_mem_layout, test/test_tensorexpr.py::TestTensorExprFuser::test_rand_like, test/test_tensorexpr.py::TestTensorExprFuser::test_rank_two, test/test_tensorexpr.py::TestTensorExprFuser::test_relu, test/test_tensorexpr.py::TestTensorExprFuser::test_remainder, test/test_tensorexpr.py::TestTensorExprFuser::test_reps, test/test_tensorexpr.py::TestTensorExprFuser::test_round_2, test/test_tensorexpr.py::TestTensorExprFuser::test_scalar, test/test_tensorexpr.py::TestTensorExprFuser::test_short, test/test_tensorexpr.py::TestTensorExprFuser::test_simple_add, test/test_tensorexpr.py::TestTensorExprFuser::test_sin_pow, test/test_tensorexpr.py::TestTensorExprFuser::test_slice, test/test_tensorexpr.py::TestTensorExprFuser::test_sliced_stride, test/test_tensorexpr.py::TestTensorExprFuser::test_softmax_cpu, test/test_tensorexpr.py::TestTensorExprFuser::test_softmax_cuda, test/test_tensorexpr.py::TestTensorExprFuser::test_strided_output_preserved, test/test_tensorexpr.py::TestTensorExprFuser::test_three_arg, test/test_tensorexpr.py::TestTensorExprFuser::test_three_arg2, test/test_tensorexpr.py::TestTensorExprFuser::test_transpose, test/test_tensorexpr.py::TestTensorExprFuser::test_unary_ops, test/test_tensorexpr.py::TestTensorExprFuser::test_unsqueeze, test/test_tensorexpr.py::TestTensorExprFuser::test_where 2025-09-07T06:51:47.5011839Z 2025-09-07T06:51:47.5012046Z Running inductor/test_max_autotune 1/1 ... [2025-09-07 06:51:47.497998] 2025-09-07T06:51:47.5012439Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:51:47.5013342Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_max_autotune.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:51:47.498305] 2025-09-07T06:51:54.6731393Z 2025-09-07T06:51:54.6732439Z inductor/test_max_autotune 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_max_autotune_1.1_e33f6a3473ae301c_.log 2025-09-07T06:51:54.6801374Z Running 181 items in this shard: test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_conv1x1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_device_guard, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_addmm_max_autotune_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_addmm_max_autotune_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_baddbmm_max_autotune_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_baddbmm_max_autotune_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_bmm_max_autotune_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_bmm_max_autotune_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_mm_max_autotune_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_mm_max_autotune_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_mm_plus_mm_max_autotune_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_mm_plus_mm_max_autotune_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_baddmm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_cat_addmm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_cat_max_autotune_extern, test/inductor/test_max_autotune.py::TestMaxAutotune::test_cat_max_autotune_triton, test/inductor/test_max_autotune.py::TestMaxAutotune::test_conv1x1_with_free_symbols, test/inductor/test_max_autotune.py::TestMaxAutotune::test_conv3d, test/inductor/test_max_autotune.py::TestMaxAutotune::test_conv_backend, test/inductor/test_max_autotune.py::TestMaxAutotune::test_conv_cat, test/inductor/test_max_autotune.py::TestMaxAutotune::test_empty_conv_input, test/inductor/test_max_autotune.py::TestMaxAutotune::test_empty_conv_input_with_1x1_kernel, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout0_op_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout0_op_scaled_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout_0_op_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout_0_op_scaled_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout_27_op_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout_27_op_scaled_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_inf_timing_multi_template_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_inf_timing_multi_template_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_jit_fusion_matches_aot_fusion, test/inductor/test_max_autotune.py::TestMaxAutotune::test_linear_and_cel, test/inductor/test_max_autotune.py::TestMaxAutotune::test_matmul_dropout_device_cpu, test/inductor/test_max_autotune.py::TestMaxAutotune::test_matmul_dropout_device_cuda, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_False_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_False_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_False_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_False_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_True_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_True_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_True_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_True_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_illegal_alignment_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_illegal_alignment_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_tma_dynamic_outer_dim, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_zero_size_input_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_zero_size_input_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_bfloat16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_bfloat16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_bfloat16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float32_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float32_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float32_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_bfloat16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_bfloat16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_bfloat16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float32_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float32_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float32_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_non_contiguous_second_matrix_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_non_contiguous_second_matrix_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_with_epilogue, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_float16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_float16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_float16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_bfloat16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_bfloat16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_bfloat16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_float16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_float16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_float16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_input, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_input_bwd, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_0_decompose_k_threshold_16, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_0_decompose_k_threshold_8, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_20_decompose_k_threshold_16, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_20_decompose_k_threshold_8, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_5_decompose_k_threshold_16, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_5_decompose_k_threshold_8, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_output_stride, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_exhaustive, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_mm_plus_mm_zero_size_input_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_mm_plus_mm_zero_size_input_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_prune_choices, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_False_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_False_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_False_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_False_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_True_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_True_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_True_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_True_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_illegal_alignment_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_illegal_alignment_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_False_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_False_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_False_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_False_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_True_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_True_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_True_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_True_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_tma_dynamic_outer_dim, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_zero_size_input_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_zero_size_input_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_mm_k_1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_mutation_rename, test/inductor/test_max_autotune.py::TestMaxAutotune::test_no_valid_choices, test/inductor/test_max_autotune.py::TestMaxAutotune::test_non_contiguous_input_addmm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_non_contiguous_input_bmm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_non_contiguous_input_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_non_contiguous_input_mm_plus_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_triton_template_generated_code_cache_key, test/inductor/test_max_autotune.py::TestMaxAutotune::test_triton_template_generated_code_cache_strategy, test/inductor/test_max_autotune.py::TestMaxAutotune::test_triton_template_generated_code_caching, test/inductor/test_max_autotune.py::TestMaxAutotune::test_triton_template_generated_code_caching_bmm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_triton_template_generated_code_caching_mm_plus_mm, test/inductor/test_max_autotune.py::TestMaxAutotunePrecompile::test_filled_cache_precompile, test/inductor/test_max_autotune.py::TestMaxAutotunePrecompile::test_precompilation_threads, test/inductor/test_max_autotune.py::TestMaxAutotunePrecompile::test_precompilations, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_benchmark_choice_fail_in_subproc, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_benchmark_choice_in_subproc, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_addmm_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_addmm_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_mm_plus_mm_autotune_in_subproc_False_autotune_multi_device_False, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_mm_plus_mm_autotune_in_subproc_False_autotune_multi_device_True, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_mm_plus_mm_autotune_in_subproc_True_autotune_multi_device_False, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_mm_plus_mm_autotune_in_subproc_True_autotune_multi_device_True, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_regular_mm_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_regular_mm_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_triton_template_with_epilogues_and_dynamic_shape, test/inductor/test_max_autotune.py::TestMaxAutotuneRemoteCache::test_max_autotune_remote_caching_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotuneRemoteCache::test_max_autotune_remote_caching_dynamic_True, test/inductor/test_max_autotune.py::TestTuningProcess::test_tuning_subproc_crash, test/inductor/test_max_autotune.py::TestTuningProcess::test_tuning_subproc_exception, test/inductor/test_max_autotune.py::TestTuningProcess::test_tuning_subproc_killed, test/inductor/test_max_autotune.py::TestTuningProcess::test_tuning_subproc_timeout, test/inductor/test_max_autotune.py::TestTuningProcess::test_visible_devices, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_add_feedback_saver, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_clear_feedback_savers, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_feedback_saver_integration, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_tuning_pool_crash, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_tuning_pool_multiple_devices, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_tuning_pool_timeout, test/inductor/test_max_autotune.py::TestPrologueFusion::test_broadcast_x_K_63, test/inductor/test_max_autotune.py::TestPrologueFusion::test_broadcast_x_K_64, test/inductor/test_max_autotune.py::TestPrologueFusion::test_broadcast_y, test/inductor/test_max_autotune.py::TestPrologueFusion::test_downcast, test/inductor/test_max_autotune.py::TestPrologueFusion::test_gather_fusion, test/inductor/test_max_autotune.py::TestPrologueFusion::test_low_precision, test/inductor/test_max_autotune.py::TestPrologueFusion::test_mismatched_prologue_group, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_fusions_sizes0, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_fusions_sizes1, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_fusions_sizes2, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_inputs_sizes0, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_inputs_sizes1, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_inputs_sizes2, test/inductor/test_max_autotune.py::TestPrologueFusion::test_pending_fusion_pro_and_epi, test/inductor/test_max_autotune.py::TestPrologueFusion::test_pending_fusions_multiple, test/inductor/test_max_autotune.py::TestPrologueFusion::test_preserves_zero_analysis, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_masked_load_sizes0, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_masked_load_sizes1, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_masked_load_sizes2, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_multiple_nodes_sizes0, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_multiple_nodes_sizes1, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_multiple_nodes_sizes2, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_read_into_both_inputs_benchmark_fusion_False, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_read_into_both_inputs_benchmark_fusion_True, test/inductor/test_max_autotune.py::TestPrologueFusion::test_storage_offset_prologue, test/inductor/test_max_autotune.py::TestPrologueFusion::test_upcast_sizes0, test/inductor/test_max_autotune.py::TestPrologueFusion::test_upcast_sizes1, test/inductor/test_max_autotune.py::TestPrologueFusion::test_upcast_sizes2 2025-09-07T06:51:54.6863663Z 2025-09-07T06:51:54.6863836Z Running test_multiprocessing 1/1 ... [2025-09-07 06:51:54.673821] 2025-09-07T06:51:54.6864195Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:51:54.6865051Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_multiprocessing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:51:54.674128] 2025-09-07T06:51:58.4442545Z 2025-09-07T06:51:58.4443473Z test_multiprocessing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_multiprocessing_1.1_8d0fc9271907d4f2_.log 2025-09-07T06:51:58.4457036Z Running 42 items in this shard: test/test_multiprocessing.py::TestMultiprocessing::test_autograd_errors, test/test_multiprocessing.py::TestMultiprocessing::test_autograd_fine_with_spawn, test/test_multiprocessing.py::TestMultiprocessing::test_cuda_bad_call, test/test_multiprocessing.py::TestMultiprocessing::test_cuda_ipc_deadlock, test/test_multiprocessing.py::TestMultiprocessing::test_cuda_memory_allocation, test/test_multiprocessing.py::TestMultiprocessing::test_cuda_parameter_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_cuda_send_many, test/test_multiprocessing.py::TestMultiprocessing::test_cuda_simple, test/test_multiprocessing.py::TestMultiprocessing::test_cuda_small_tensors, test/test_multiprocessing.py::TestMultiprocessing::test_cuda_variable_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_empty_shared, test/test_multiprocessing.py::TestMultiprocessing::test_empty_tensor_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_empty_tensor_sharing_cuda, test/test_multiprocessing.py::TestMultiprocessing::test_empty_tensor_sharing_meta, test/test_multiprocessing.py::TestMultiprocessing::test_event, test/test_multiprocessing.py::TestMultiprocessing::test_event_handle_exporter, test/test_multiprocessing.py::TestMultiprocessing::test_event_handle_importer, test/test_multiprocessing.py::TestMultiprocessing::test_event_handle_multi_gpu, test/test_multiprocessing.py::TestMultiprocessing::test_event_multiprocess, test/test_multiprocessing.py::TestMultiprocessing::test_fd_pool, test/test_multiprocessing.py::TestMultiprocessing::test_fd_preserve_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_fd_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_fs, test/test_multiprocessing.py::TestMultiprocessing::test_fs_is_shared, test/test_multiprocessing.py::TestMultiprocessing::test_fs_pool, test/test_multiprocessing.py::TestMultiprocessing::test_fs_preserve_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_fs_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_inherit_tensor, test/test_multiprocessing.py::TestMultiprocessing::test_integer_parameter_serialization_cpu, test/test_multiprocessing.py::TestMultiprocessing::test_integer_parameter_serialization_cuda, test/test_multiprocessing.py::TestMultiprocessing::test_is_shared, test/test_multiprocessing.py::TestMultiprocessing::test_is_shared_cuda, test/test_multiprocessing.py::TestMultiprocessing::test_leaf_variable_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_meta_simple, test/test_multiprocessing.py::TestMultiprocessing::test_mixed_types_cuda_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_non_leaf_variable_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_parameter_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_rebuild_cuda_tensor, test/test_multiprocessing.py::TestMultiprocessing::test_set_thread_name, test/test_multiprocessing.py::TestMultiprocessing::test_tensor_sharing_meta, test/test_multiprocessing.py::TestMultiprocessing::test_variable_sharing, test/test_multiprocessing.py::TestMultiprocessing::test_wrong_cuda_fork 2025-09-07T06:51:58.4467733Z 2025-09-07T06:51:58.4467879Z Running test_dispatch 1/1 ... [2025-09-07 06:51:58.444762] 2025-09-07T06:51:58.4468202Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:51:58.4469025Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:51:58.445062] 2025-09-07T06:52:02.1654740Z 2025-09-07T06:52:02.1656128Z test_dispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_dispatch_1.1_804050e0bad08106_.log 2025-09-07T06:52:02.1667732Z Running 32 items in this shard: test/test_dispatch.py::TestDispatch::test_all_invariants, test/test_dispatch.py::TestDispatch::test_computed_table, test/test_dispatch.py::TestDispatch::test_computed_table_with_ambiguous_autogradother, test/test_dispatch.py::TestDispatch::test_computed_table_with_autograd, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_autograd_defaultbackend, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_autograd_math, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_autograd_math_defaultbackend, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_defaultbackend, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_math, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_math_autogradcpu_fallthrough, test/test_dispatch.py::TestDispatch::test_computed_table_with_math, test/test_dispatch.py::TestDispatch::test_def, test/test_dispatch.py::TestDispatch::test_def_impl_schema_mismatch, test/test_dispatch.py::TestDispatch::test_def_only, test/test_dispatch.py::TestDispatch::test_def_with_explicit_alias, test/test_dispatch.py::TestDispatch::test_def_with_inference, test/test_dispatch.py::TestDispatch::test_dispatch_print_registrations_for_dispatch_key_invalid, test/test_dispatch.py::TestDispatch::test_find_dangling_impls, test/test_dispatch.py::TestDispatch::test_find_dangling_impls_ext, test/test_dispatch.py::TestDispatch::test_impl_only, test/test_dispatch.py::TestDispatch::test_multiple_def_alias_defaulting, test/test_dispatch.py::TestDispatch::test_multiple_def_alias_mismatch, test/test_dispatch.py::TestDispatch::test_multiple_def_error, test/test_dispatch.py::TestDispatch::test_multiple_fallback, test/test_dispatch.py::TestDispatch::test_overwrite_math, test/test_dispatch.py::TestPythonDispatcher::test_autogradother, test/test_dispatch.py::TestPythonDispatcher::test_basic, test/test_dispatch.py::TestPythonDispatcher::test_defaultbackend_autogradcpu, test/test_dispatch.py::TestPythonDispatcher::test_defaultbackend_math, test/test_dispatch.py::TestPythonDispatcher::test_duplicate_registrations, test/test_dispatch.py::TestPythonDispatcher::test_math_autogradcpu, test/test_dispatch.py::TestPythonDispatcher::test_quantized_structured_not_implemented 2025-09-07T06:52:02.1675245Z 2025-09-07T06:52:02.1675669Z Running test_namedtuple_return_api 1/1 ... [2025-09-07 06:52:02.165787] 2025-09-07T06:52:02.1676057Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:52:02.1676932Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_namedtuple_return_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:52:02.166089] 2025-09-07T06:52:05.8359790Z 2025-09-07T06:52:05.8361058Z test_namedtuple_return_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_namedtuple_return_api_1.1_74d55665462dbd14_.log 2025-09-07T06:52:05.8363856Z Running 3 items in this shard: test/test_namedtuple_return_api.py::TestNamedTupleAPI::test_import_return_types, test/test_namedtuple_return_api.py::TestNamedTupleAPI::test_namedtuple_return, test/test_namedtuple_return_api.py::TestNamedTupleAPI::test_native_functions_yaml 2025-09-07T06:52:05.8365722Z 2025-09-07T06:52:05.8366638Z Running test_cpp_extensions_mtia_backend 1/1 ... [2025-09-07 06:52:05.836435] 2025-09-07T06:52:05.8367301Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:52:05.8370576Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_extensions_mtia_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:52:05.836824] 2025-09-07T06:52:09.1563104Z 2025-09-07T06:52:09.1564531Z test_cpp_extensions_mtia_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_extensions_mtia_backend_1.1_0d6ccaca74d8a2f2_.log 2025-09-07T06:52:09.1569847Z Running 5 items in this shard: test/test_cpp_extensions_mtia_backend.py::TestCppExtensionMTIABackend::test_device_context, test/test_cpp_extensions_mtia_backend.py::TestCppExtensionMTIABackend::test_get_device_module, test/test_cpp_extensions_mtia_backend.py::TestCppExtensionMTIABackend::test_stream_basic, test/test_cpp_extensions_mtia_backend.py::TestCppExtensionMTIABackend::test_stream_context, test/test_cpp_extensions_mtia_backend.py::TestCppExtensionMTIABackend::test_stream_context_different_device 2025-09-07T06:52:09.1572791Z 2025-09-07T06:52:09.1573080Z Running test_jit_disabled 1/1 ... [2025-09-07 06:52:09.156726] 2025-09-07T06:52:09.1573646Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:52:09.1575235Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_jit_disabled.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:52:09.157110] 2025-09-07T06:52:12.8770923Z 2025-09-07T06:52:12.8772164Z test_jit_disabled 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_disabled_1.1_d8d1477887b78844_.log 2025-09-07T06:52:12.8774509Z Running 3 items in this shard: test/test_jit_disabled.py::TestJitDisabled::test_attribute, test/test_jit_disabled.py::TestJitDisabled::test_recursive_script, test/test_jit_disabled.py::TestJitDisabled::test_script_module_construction 2025-09-07T06:52:12.8775995Z 2025-09-07T06:52:12.8776341Z Running test_fake_tensor 1/1 ... [2025-09-07 06:52:12.877295] 2025-09-07T06:52:12.8777018Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:52:12.8778719Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fake_tensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:52:12.877600] 2025-09-07T06:52:21.1040413Z 2025-09-07T06:52:21.1041716Z test_fake_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fake_tensor_1.1_5d7bfdd003997d3e_.log 2025-09-07T06:52:21.1133399Z Running 279 items in this shard: test/test_fake_tensor.py::FakeTensorTest::test__adaptive_avg_pool2d_backward, test/test_fake_tensor.py::FakeTensorTest::test_alias_call, test/test_fake_tensor.py::FakeTensorTest::test_allow_meta, test/test_fake_tensor.py::FakeTensorTest::test_aten_copy_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_aten_index_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_aten_slice_scatter_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_basic, test/test_fake_tensor.py::FakeTensorTest::test_batch_tensor, test/test_fake_tensor.py::FakeTensorTest::test_binary_op_type_promotion, test/test_fake_tensor.py::FakeTensorTest::test_constructor, test/test_fake_tensor.py::FakeTensorTest::test_convert_fake_to_real, test/test_fake_tensor.py::FakeTensorTest::test_cpu_fallback, test/test_fake_tensor.py::FakeTensorTest::test_cuda_initialized, test/test_fake_tensor.py::FakeTensorTest::test_cuda_lstm, test/test_fake_tensor.py::FakeTensorTest::test_cudnn_rnn_with_fallback, test/test_fake_tensor.py::FakeTensorTest::test_cudnn_rnn_without_fallback, test/test_fake_tensor.py::FakeTensorTest::test_custom_op_fallback, test/test_fake_tensor.py::FakeTensorTest::test_data_dependent_operator, test/test_fake_tensor.py::FakeTensorTest::test_deepcopy, test/test_fake_tensor.py::FakeTensorTest::test_device_inplace_copy, test/test_fake_tensor.py::FakeTensorTest::test_embedding_bag_meta, test/test_fake_tensor.py::FakeTensorTest::test_export_numpy, test/test_fake_tensor.py::FakeTensorTest::test_fake_device, test/test_fake_tensor.py::FakeTensorTest::test_fake_dispatch_keys, test/test_fake_tensor.py::FakeTensorTest::test_fake_grad_copy, test/test_fake_tensor.py::FakeTensorTest::test_fake_mode_error, test/test_fake_tensor.py::FakeTensorTest::test_fast_div, test/test_fake_tensor.py::FakeTensorTest::test_from_numpy, test/test_fake_tensor.py::FakeTensorTest::test_fsdp_flat_param, test/test_fake_tensor.py::FakeTensorTest::test_full, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_complex128, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_complex64, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float32, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float64, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fn, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fnuz, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e5m2, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e5m2fnuz, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int16, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int32, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int64, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int8, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_uint8, test/test_fake_tensor.py::FakeTensorTest::test_index_put_error, test/test_fake_tensor.py::FakeTensorTest::test_jagged_fake_to_fake_preserved, test/test_fake_tensor.py::FakeTensorTest::test_like_constructor, test/test_fake_tensor.py::FakeTensorTest::test_mixed_real_and_fake_inputs, test/test_fake_tensor.py::FakeTensorTest::test_mode, test/test_fake_tensor.py::FakeTensorTest::test_nan_to_num, test/test_fake_tensor.py::FakeTensorTest::test_nanmean_out, test/test_fake_tensor.py::FakeTensorTest::test_new, test/test_fake_tensor.py::FakeTensorTest::test_no_tag_func, test/test_fake_tensor.py::FakeTensorTest::test_non_kwarg_device, test/test_fake_tensor.py::FakeTensorTest::test_non_overlapping_stride_zero, test/test_fake_tensor.py::FakeTensorTest::test_non_parameter_grad, test/test_fake_tensor.py::FakeTensorTest::test_normalize_device, test/test_fake_tensor.py::FakeTensorTest::test_op_with_zero_dim_bypassed, test/test_fake_tensor.py::FakeTensorTest::test_out_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_parameter_instantiation, test/test_fake_tensor.py::FakeTensorTest::test_parameter_view, test/test_fake_tensor.py::FakeTensorTest::test_print_in_fake_mode, test/test_fake_tensor.py::FakeTensorTest::test_randperm, test/test_fake_tensor.py::FakeTensorTest::test_recursive_invocation, test/test_fake_tensor.py::FakeTensorTest::test_repr, test/test_fake_tensor.py::FakeTensorTest::test_same_shape_env_preserved, test/test_fake_tensor.py::FakeTensorTest::test_scalar_inputs, test/test_fake_tensor.py::FakeTensorTest::test_scan_reverse_False, test/test_fake_tensor.py::FakeTensorTest::test_scan_reverse_True, test/test_fake_tensor.py::FakeTensorTest::test_setitem, test/test_fake_tensor.py::FakeTensorTest::test_shape_take_not_device, test/test_fake_tensor.py::FakeTensorTest::test_split_return_self, test/test_fake_tensor.py::FakeTensorTest::test_throw, test/test_fake_tensor.py::FakeTensorTest::test_tolist, test/test_fake_tensor.py::FakeTensorTest::test_type_as, test/test_fake_tensor.py::FakeTensorTest::test_unbind_copy_out, test/test_fake_tensor.py::FakeTensorTest::test_unsqueeze_copy, test/test_fake_tensor.py::FakeTensorTest::test_upsample_bilinear_small_channels, test/test_fake_tensor.py::FakeTensorTest::test_zero_dim, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test__adaptive_avg_pool2d_backward_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_alias_call_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_allow_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_aten_copy_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_aten_index_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_aten_slice_scatter_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_basic_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_batch_tensor_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_binary_op_type_promotion_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_constructor_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_convert_fake_to_real_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cpu_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cuda_initialized_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cuda_lstm_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cudnn_rnn_with_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cudnn_rnn_without_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_custom_op_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_data_dependent_operator_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_deepcopy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_device_inplace_copy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_embedding_bag_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_export_numpy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_dispatch_keys_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_grad_copy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_mode_error_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fast_div_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_from_numpy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fsdp_flat_param_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_full_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_complex128_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_complex64_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float32_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float64_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fn_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fnuz_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e5m2_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e5m2fnuz_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int16_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int32_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int64_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int8_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_uint8_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_put_error_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_jagged_fake_to_fake_preserved_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_like_constructor_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_mixed_real_and_fake_inputs_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_nan_to_num_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_nanmean_out_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_new_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_no_tag_func_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_non_kwarg_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_non_overlapping_stride_zero_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_non_parameter_grad_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_normalize_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_op_with_zero_dim_bypassed_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_out_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_parameter_instantiation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_parameter_view_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_print_in_fake_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_randperm_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_recursive_invocation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_repr_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_same_shape_env_preserved_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_scalar_inputs_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_scan_reverse_False_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_scan_reverse_True_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_setitem_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_shape_take_not_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_split_return_self_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_throw_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_tolist_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_type_as_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_unbind_copy_out_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_unsqueeze_copy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_upsample_bilinear_small_channels_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_zero_dim_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorConstHandling::test_aliased_const_write, test/test_fake_tensor.py::FakeTensorConstHandling::test_constant_invalidation, test/test_fake_tensor.py::FakeTensorConstHandling::test_constant_propagate_through_functions, test/test_fake_tensor.py::FakeTensorConstHandling::test_fake_tensor_batch_norm_cpu, test/test_fake_tensor.py::FakeTensorConstHandling::test_fake_tensor_in_intlist_repro, test/test_fake_tensor.py::FakeTensorConstHandling::test_inplace_add, test/test_fake_tensor.py::FakeTensorConstHandling::test_inplace_view_invalidation, test/test_fake_tensor.py::FakeTensorConstHandling::test_shared_storage_invalidation, test/test_fake_tensor.py::FakeTensorConstHandling::test_shared_storages, test/test_fake_tensor.py::FakeTensorConstHandling::test_simple, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_aliased_const_write_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_constant_invalidation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_constant_propagate_through_functions_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_fake_tensor_batch_norm_cpu_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_fake_tensor_in_intlist_repro_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_inplace_add_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_inplace_view_invalidation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_shared_storage_invalidation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_shared_storages_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_simple_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyCatCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyCubeCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyMulCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyMulScalarCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyNMSCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyNonzeroCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpySortCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpySplitCopyCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpySplitCopyWithIntCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyTakeCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyViewCopyCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorConverterTest::test_dead_key, test/test_fake_tensor.py::FakeTensorConverterTest::test_dead_weak_ref, test/test_fake_tensor.py::FakeTensorConverterTest::test_memoized_conversion_from_meta, test/test_fake_tensor.py::FakeTensorConverterTest::test_memoized_conversion_to_meta, test/test_fake_tensor.py::FakeTensorConverterTest::test_multiple_modes, test/test_fake_tensor.py::FakeTensorConverterTest::test_no_active_mode, test/test_fake_tensor.py::FakeTensorConverterTest::test_no_ref_cycle, test/test_fake_tensor.py::FakeTensorConverterTest::test_separate_mode_error, test/test_fake_tensor.py::FakeTensorConverterTest::test_separate_tensor_storages_non_view, test/test_fake_tensor.py::FakeTensorConverterTest::test_separate_tensor_storages_view, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_dead_key_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_dead_weak_ref_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_memoized_conversion_from_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_memoized_conversion_to_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_multiple_modes_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_no_active_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_no_ref_cycle_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_separate_mode_error_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_separate_tensor_storages_non_view_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_separate_tensor_storages_view_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_conv_c1_backward, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_cross_entropy_loss, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_embedding_bag_private, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_fake_gpu_no_init, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_flash_attention, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_like_ops, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_module_to, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_no_dispatch_with_like_function, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_non_kwarg_only_device, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_sparse_new, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_str_storage, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_tensor_constructors_all_have_kwarg_device, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_tensor_new, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_conv_c1_backward_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_cross_entropy_loss_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_embedding_bag_private_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_fake_gpu_no_init_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_flash_attention_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_like_ops_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_module_to_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_no_dispatch_with_like_function_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_non_kwarg_only_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_sparse_new_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_str_storage_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_tensor_constructors_all_have_kwarg_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_tensor_new_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorPropTest::test_fake_tensor_prop_on_nn_module, test/test_fake_tensor.py::FakeTensorPropTest::test_fake_tensor_prop_on_nn_module_with_optional_args, test/test_fake_tensor.py::FakeTensorPropTest::test_nan_to_num, test/test_fake_tensor.py::FakeTensorPropTest::test_nonzero_stride, test/test_fake_tensor.py::FakeTensorPropTest::test_torch_load_with_fake_mode, test/test_fake_tensor.py::FakeTensorPropTest::test_unbacked_shape_realloc, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_fake_tensor_prop_on_nn_module_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_fake_tensor_prop_on_nn_module_with_optional_args_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_nan_to_num_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_nonzero_stride_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_torch_load_with_fake_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_unbacked_shape_realloc_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorSerialization::test_serialization, test/test_fake_tensor.py::FakeTensorSerialization::test_serialization_with_tracing, test/test_fake_tensor.py::FakeTensorDispatchCache::test__upsample_bilinear2d_aa_backward_dynamic_shapes, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_aten_index, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_bypass, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_default_device, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_default_dtype, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_dispatch_key_set, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_hit, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_inplace_op, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_constants, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_device, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_dtype, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_is_conj, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_is_inference, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_is_neg, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_memory_format, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_requires_grad, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_shape, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_storage_offset, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_stride, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_tuple_outputs, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_view_op, test/test_fake_tensor.py::FakeTensorDispatchCache::test_fft_hfft2_issue145522, test/test_fake_tensor.py::FakeTensorDispatchCache::test_from_buffer, test/test_fake_tensor.py::FakeTensorDispatchCache::test_inference_mode, test/test_fake_tensor.py::FakeTensorDispatchCache::test_invoke_subgraph, test/test_fake_tensor.py::FakeTensorDispatchCache::test_invoke_subgraph_cacheable_inplace, test/test_fake_tensor.py::FakeTensorDispatchCache::test_meta_tensor_to_fake_cpu, test/test_fake_tensor.py::FakeTensorDispatchCache::test_shape_env_settings, test/test_fake_tensor.py::FakeTensorDispatchCache::test_unbacked_output, test/test_fake_tensor.py::FakeTensorDispatchCache::test_wrapper_tensor_subclass_different_device, test/test_fake_tensor.py::FakeTensorPreferDeviceType::test_fake_tensor_prefer_device_type, test/test_fake_tensor.py::FakeTensorPreferDeviceType::test_fake_tensor_prefer_device_type_cpu_only 2025-09-07T06:52:21.1220170Z 2025-09-07T06:52:21.1220356Z Running test_cuda_trace 1/1 ... [2025-09-07 06:52:21.104722] 2025-09-07T06:52:21.1220697Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:52:21.1221608Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_trace.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:52:21.105025] 2025-09-07T06:53:07.0827472Z 2025-09-07T06:53:07.0828704Z test_cuda_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_trace_1.1_11c876805a7ce33a_.log 2025-09-07T06:53:07.0836344Z Running 12 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_all_trace_callbacks_called, test/test_cuda_trace.py::TestCudaTrace::test_device_synchronization_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_creation_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_deletion_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_record_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_synchronization_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_wait_callback, test/test_cuda_trace.py::TestCudaTrace::test_memcpy_synchronization, test/test_cuda_trace.py::TestCudaTrace::test_memory_allocation_callback, test/test_cuda_trace.py::TestCudaTrace::test_memory_deallocation_callback, test/test_cuda_trace.py::TestCudaTrace::test_stream_creation_callback, test/test_cuda_trace.py::TestCudaTrace::test_stream_synchronization_callback 2025-09-07T06:53:07.0841613Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_all_trace_callbacks_called 2025-09-07T06:53:07.0842547Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_device_synchronization_callback 2025-09-07T06:53:07.0843410Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_creation_callback 2025-09-07T06:53:07.0844212Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_deletion_callback 2025-09-07T06:53:07.0844995Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_record_callback 2025-09-07T06:53:07.0845818Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_synchronization_callback 2025-09-07T06:53:07.0846639Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_wait_callback 2025-09-07T06:53:07.0847431Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memcpy_synchronization 2025-09-07T06:53:07.0848238Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memory_allocation_callback 2025-09-07T06:53:07.0849070Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memory_deallocation_callback 2025-09-07T06:53:07.0851852Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_stream_creation_callback 2025-09-07T06:53:07.0852677Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_stream_synchronization_callback 2025-09-07T06:53:07.0853068Z 2025-09-07T06:53:07.0853285Z Running test_cpp_extensions_stream_and_event 1/1 ... [2025-09-07 06:53:07.083141] 2025-09-07T06:53:07.0853688Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:53:07.0855127Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_extensions_stream_and_event.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:53:07.083490] 2025-09-07T06:53:10.3528115Z 2025-09-07T06:53:10.3529423Z test_cpp_extensions_stream_and_event 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_extensions_stream_and_event_1.1_10f68a7e429a4d7d_.log 2025-09-07T06:53:10.3531183Z Running 1 items in this shard: test/test_cpp_extensions_stream_and_event.py::TestCppExtensionStreamAndEvent::test_stream_event 2025-09-07T06:53:10.3531999Z 2025-09-07T06:53:10.3532721Z Running test_python_dispatch 1/1 ... [2025-09-07 06:53:10.352997] 2025-09-07T06:53:10.3533382Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:53:10.3535558Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_python_dispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:53:10.353289] 2025-09-07T06:53:15.6253996Z 2025-09-07T06:53:15.6255523Z test_python_dispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_python_dispatch_1.1_dc5e2156662ed73a_.log 2025-09-07T06:53:15.6294334Z Running 119 items in this shard: test/test_python_dispatch.py::TestDispatcherPythonBindings::test_call_boxed, test/test_python_dispatch.py::TestPythonRegistration::test_alias_analysis, test/test_python_dispatch.py::TestPythonRegistration::test_create_new_library, test/test_python_dispatch.py::TestPythonRegistration::test_create_new_library_fragment_no_existing, test/test_python_dispatch.py::TestPythonRegistration::test_create_new_library_fragment_with_existing, test/test_python_dispatch.py::TestPythonRegistration::test_dispatcher_error_filenames, test/test_python_dispatch.py::TestPythonRegistration::test_dispatchkeyset_eq, test/test_python_dispatch.py::TestPythonRegistration::test_dispatchkeyset_pickle, test/test_python_dispatch.py::TestPythonRegistration::test_error_for_unsupported_ns_or_kind, test/test_python_dispatch.py::TestPythonRegistration::test_error_if_fn_not_callable, test/test_python_dispatch.py::TestPythonRegistration::test_extend_library_with_dispatch_key_arg, test/test_python_dispatch.py::TestPythonRegistration::test_fallback, test/test_python_dispatch.py::TestPythonRegistration::test_fallback_fallthrough, test/test_python_dispatch.py::TestPythonRegistration::test_fallback_keyset, test/test_python_dispatch.py::TestPythonRegistration::test_fallthrough_for_dense_key_with_meta_in_tls, test/test_python_dispatch.py::TestPythonRegistration::test_finalizer, test/test_python_dispatch.py::TestPythonRegistration::test_override_aten_ops_with_multiple_libraries, test/test_python_dispatch.py::TestPythonRegistration::test_override_cpu_sum, test/test_python_dispatch.py::TestPythonRegistration::test_override_cuda_with_jiterator, test/test_python_dispatch.py::TestPythonRegistration::test_register_fallthrough, test/test_python_dispatch.py::TestPythonRegistration::test_returning_symint, test/test_python_dispatch.py::TestPythonDispatch::test_all_same_mode, test/test_python_dispatch.py::TestPythonDispatch::test_autograd_in_attr, test/test_python_dispatch.py::TestPythonDispatch::test_basic, test/test_python_dispatch.py::TestPythonDispatch::test_capture_logs_with_torch_dispatch_mode, test/test_python_dispatch.py::TestPythonDispatch::test_construct_int_tensor, test/test_python_dispatch.py::TestPythonDispatch::test_custom_autograd, test/test_python_dispatch.py::TestPythonDispatch::test_custom_dispatch_mode_not_supports_higher_order_operators, test/test_python_dispatch.py::TestPythonDispatch::test_custom_dispatch_mode_supports_higher_order_operators, test/test_python_dispatch.py::TestPythonDispatch::test_custom_size_policy_dynamic_shapes, test/test_python_dispatch.py::TestPythonDispatch::test_data_ptr_respects_numel_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_deepcopy_non_wrapper_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_deepcopy_wrapper_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_deepcopy_wrapper_subclass_with_clone_returning_different_type, test/test_python_dispatch.py::TestPythonDispatch::test_detach_appears_twice_when_called_once, test/test_python_dispatch.py::TestPythonDispatch::test_device_slowpath, test/test_python_dispatch.py::TestPythonDispatch::test_dim_slowpath, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_super_call, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_super_call_list_arg, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_super_dont_autograd, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_uint64, test/test_python_dispatch.py::TestPythonDispatch::test_error_using_class_method_on_mode, test/test_python_dispatch.py::TestPythonDispatch::test_exception_handling, test/test_python_dispatch.py::TestPythonDispatch::test_fancy_strides, test/test_python_dispatch.py::TestPythonDispatch::test_format, test/test_python_dispatch.py::TestPythonDispatch::test_get_cur_mode, test/test_python_dispatch.py::TestPythonDispatch::test_get_mode_stack, test/test_python_dispatch.py::TestPythonDispatch::test_index_put_where_only_index_is_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_invalid_ret, test/test_python_dispatch.py::TestPythonDispatch::test_is_contiguous_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_kwarg_only, test/test_python_dispatch.py::TestPythonDispatch::test_kwarg_only_and_positional_default, test/test_python_dispatch.py::TestPythonDispatch::test_layout_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_like, test/test_python_dispatch.py::TestPythonDispatch::test_list_ret, test/test_python_dispatch.py::TestPythonDispatch::test_make_fx_with_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_make_subclass_with_modes, test/test_python_dispatch.py::TestPythonDispatch::test_make_wrapper_subclass_noalloc, test/test_python_dispatch.py::TestPythonDispatch::test_make_wrapper_subclass_propagates_metadata, test/test_python_dispatch.py::TestPythonDispatch::test_maybe_tuple_bug, test/test_python_dispatch.py::TestPythonDispatch::test_mode_detection, test/test_python_dispatch.py::TestPythonDispatch::test_mode_with_make_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_multiple_ops_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_nested_push_logging_tensor_mode, test/test_python_dispatch.py::TestPythonDispatch::test_nesting_same_mode, test/test_python_dispatch.py::TestPythonDispatch::test_new_ones, test/test_python_dispatch.py::TestPythonDispatch::test_none_wrapping, test/test_python_dispatch.py::TestPythonDispatch::test_notimplemented_mode, test/test_python_dispatch.py::TestPythonDispatch::test_optional_tensor_list, test/test_python_dispatch.py::TestPythonDispatch::test_out, test/test_python_dispatch.py::TestPythonDispatch::test_produce_real_type, test/test_python_dispatch.py::TestPythonDispatch::test_record_stream, test/test_python_dispatch.py::TestPythonDispatch::test_return_and_correct_aliasing_gives_correct_stride, test/test_python_dispatch.py::TestPythonDispatch::test_return_stream, test/test_python_dispatch.py::TestPythonDispatch::test_set_data, test/test_python_dispatch.py::TestPythonDispatch::test_shallow_copy_and_detach, test/test_python_dispatch.py::TestPythonDispatch::test_sizes_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_standard_is_not_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_storage, test/test_python_dispatch.py::TestPythonDispatch::test_storage_can_be_converted_to_python_object, test/test_python_dispatch.py::TestPythonDispatch::test_strides_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_subclass_creation, test/test_python_dispatch.py::TestPythonDispatch::test_subclass_priority, test/test_python_dispatch.py::TestPythonDispatch::test_sym_sizes_strides_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_tolist_numpy_with_torch_dispatch_mode, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_basic, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_respects_no_dispatch, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_subclass_priority, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_unrelated_tensors, test/test_python_dispatch.py::TestPythonDispatch::test_version, test/test_python_dispatch.py::TestPythonDispatch::test_view_returns_alias_under_torch_dispatch, test/test_python_dispatch.py::TestPythonDispatch::test_with_mode_created_separately, test/test_python_dispatch.py::TestPythonDispatch::test_with_nested_modes, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_extra_dispatch_keys, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_multiprocessing_preserves_dtype, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_reentrant_dispatch_with_mode, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_serializes, test/test_python_dispatch.py::TestPythonDispatcher::test_basic, test/test_python_dispatch.py::TestPythonDispatcher::test_lstsq, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_cat_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_conv2d_cuda, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyCatCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyCubeCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyMulCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyMulScalarCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyNMSCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyNonzeroCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpySortCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpySplitCopyCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpySplitCopyWithIntCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyTakeCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyViewCopyCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_fft_fft2_cuda, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_mul_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_native_batch_norm_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_out_op_cuda, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_split_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_split_list_args_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_view_cuda_float32 2025-09-07T06:53:15.6328761Z 2025-09-07T06:53:15.6328954Z Running test_tensor_creation_ops 1/1 ... [2025-09-07 06:53:15.625641] 2025-09-07T06:53:15.6329313Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:53:15.6330327Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_tensor_creation_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:53:15.625942] 2025-09-07T06:53:21.1487389Z 2025-09-07T06:53:21.1488721Z test_tensor_creation_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_tensor_creation_ops_1.1_e045177855637eb4_.log 2025-09-07T06:53:21.1650369Z Running 531 items in this shard: test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_arange_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_arange_device_vs_cpu_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_arange_device_vs_cpu_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_arange_device_vs_cpu_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_arange_device_vs_cpu_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_arange_inference_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_arange_lowp_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_arange_lowp_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_as_strided_neg_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_as_tensor_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_block_diag_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_block_diag_scipy_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cartesian_prod_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat2_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat2_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat2_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_all_dtypes_and_devices_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_big_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_empty_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_empty_legacy_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_in_channels_last_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_mem_overlap_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_misaligned_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_multi_batch_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_channels_last_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_complex128, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_uint16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_uint32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_uint64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_fast_path_dim0_dim1_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_out_memory_format_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_preserve_channels_last_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_size1_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_stack_cross_devices_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_cat_trailing_dim_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_combinations_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_complex_type_conversions_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_concat_empty_list_error_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_constructor_device_legacy_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_constructor_dtypes_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_ctor_with_numpy_array_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_device_rounding_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_device_rounding_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_device_rounding_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_diag_embed_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_diagflat_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dsplit_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dsplit_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dsplit_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dstack_cuda_complex128, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dstack_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dstack_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dstack_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dstack_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dstack_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dstack_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dstack_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dstack_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_dstack_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_empty_full_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_empty_overflow_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_empty_strided_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_empty_tensor_props_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_eye_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_fill_all_dtypes_and_devices_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_finite_cuda_bool, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_finite_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_finite_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_finite_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_finite_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_finite_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_nonfinite_cuda_bool, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_nonfinite_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_nonfinite_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_nonfinite_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_nonfinite_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_float_to_int_conversion_nonfinite_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_from_file_shared_False_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_from_file_shared_True_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_full_inference_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_full_inference_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_full_inference_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_full_out_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hsplit_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hsplit_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hsplit_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hstack_column_stack_cuda_complex128, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hstack_column_stack_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hstack_column_stack_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hstack_column_stack_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hstack_column_stack_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hstack_column_stack_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hstack_column_stack_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hstack_column_stack_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hstack_column_stack_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_hstack_column_stack_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_kaiser_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_kaiser_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_kaiser_window_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_kaiser_window_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_kaiser_window_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_kaiser_window_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_kaiser_window_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_large_linspace_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_large_linspace_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_like_fn_stride_proparation_vs_tensoriterator_unary_op_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linlogspace_mem_overlap_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_cuda_complex128, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_deduction_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_device_vs_cpu_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_device_vs_cpu_cuda_complex128, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_device_vs_cpu_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_device_vs_cpu_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_device_vs_cpu_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_device_vs_cpu_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_special_steps_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_special_steps_cuda_complex128, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_special_steps_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_special_steps_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_special_steps_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_special_steps_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_vs_numpy_complex_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_vs_numpy_cuda_complex128, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_vs_numpy_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_vs_numpy_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_vs_numpy_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_vs_numpy_integral_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_vs_numpy_integral_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_vs_numpy_integral_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_vs_numpy_integral_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_linspace_vs_numpy_integral_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_base2_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_base2_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_base2_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_deduction_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_device_vs_cpu_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_device_vs_cpu_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_device_vs_cpu_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_special_steps_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_special_steps_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_special_steps_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_vs_numpy_complex_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_vs_numpy_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_logspace_vs_numpy_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_meshgrid_default_indexing_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_meshgrid_empty_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_meshgrid_ij_indexing_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_meshgrid_ij_indexing_is_default_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_meshgrid_inconsistent_device_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_meshgrid_inconsistent_dtype_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_meshgrid_non_1d_tensor_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_meshgrid_unsupported_indexing_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_meshgrid_vs_numpy_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_meshgrid_warns_if_no_indexing_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_meshgrid_xy_indexing_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_new_empty_strided_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_new_methods_requires_grad_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_new_tensor_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_new_tensor_device_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_offset_scalar_cast_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_ones_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_bool_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_default_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_default_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_default_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_default_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_default_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_default_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_default_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_default_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_default_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_bool_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_cuda_uint16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_cuda_uint32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_from_to_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_full_range_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_full_range_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_full_range_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_full_range_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_full_range_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_full_range_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_full_range_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_full_range_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_full_range_cuda_uint16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_full_range_cuda_uint32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_full_range_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_to_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_to_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_to_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_to_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_to_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_to_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_to_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_to_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_to_cuda_uint16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_to_cuda_uint32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_random_to_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_range_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_range_factories_64bit_indexing_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_range_warning_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_bool, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_complex128, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_refs_tensor_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_repeat_interleave_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_roll_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_bartlett_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_bartlett_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_bartlett_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_bartlett_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_bartlett_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_blackman_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_blackman_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_blackman_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_blackman_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_blackman_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_hamming_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_hamming_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_hamming_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_hamming_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_hamming_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_hann_cuda_bfloat16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_hann_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_hann_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_hann_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_window_functions_window_hann_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_bartlett_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_bartlett_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_blackman_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_blackman_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_cosine_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_cosine_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_hamming_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_hamming_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_hann_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_hann_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_nuttall_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_signal_windows_functions_window_nuttall_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_simple_scalar_cast_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_stack_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_stack_out_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_storage_filename_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_strided_mismatched_stride_shape_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_tensor_ctor_device_inference_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_tensor_device_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_tensor_factories_empty_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_tensor_factory_copy_var_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_tensor_factory_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_tensor_factory_gpu_type_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_tensor_factory_gpu_type_inference_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_tensor_factory_type_inference_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_tensor_from_non_writable_numpy_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_tensor_from_sequence_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_floating_dtype_error_cuda_bool, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_floating_dtype_error_cuda_complex128, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_floating_dtype_error_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_floating_dtype_error_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_floating_dtype_error_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_floating_dtype_error_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_floating_dtype_error_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_floating_dtype_error_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_out_dtype_error_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_out_dtype_error_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_same_dtype_error_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_complex_same_dtype_error_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_polar_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_torch_polar_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_unpack_double_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_unpack_double_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vander_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vander_types_cuda_bool, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vander_types_cuda_complex128, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vander_types_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vander_types_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vander_types_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vander_types_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vander_types_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vander_types_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vander_types_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vander_types_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vsplit_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vsplit_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vsplit_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vstack_row_stack_cuda_complex128, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vstack_row_stack_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vstack_row_stack_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vstack_row_stack_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vstack_row_stack_cuda_float64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vstack_row_stack_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vstack_row_stack_cuda_int32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vstack_row_stack_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vstack_row_stack_cuda_int8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_vstack_row_stack_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_zeros_bounds_checking_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_zeros_cuda, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_zeros_dtype_layout_device_match_cuda_bool, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_zeros_dtype_layout_device_match_cuda_complex64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_zeros_dtype_layout_device_match_cuda_float16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_zeros_dtype_layout_device_match_cuda_float32, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_zeros_dtype_layout_device_match_cuda_int16, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_zeros_dtype_layout_device_match_cuda_int64, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_zeros_dtype_layout_device_match_cuda_uint8, test/test_tensor_creation_ops.py::TestTensorCreationCUDA::test_zeros_out_cuda, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_normal_cuda_float32, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_normal_cuda_float64, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_normal_std_error_cuda, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_rand_cuda_complex128, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_rand_cuda_complex32, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_rand_cuda_complex64, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_rand_cuda_float32, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_rand_cuda_float64, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randint_cuda, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randint_distribution_cuda, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randint_inference_cuda, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randn_cuda_bfloat16, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randn_cuda_complex128, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randn_cuda_complex32, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randn_cuda_complex64, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randn_cuda_float16, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randn_cuda_float32, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randn_cuda_float64, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_random_neg_values_cuda, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randperm_cuda, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randperm_device_compatibility_cuda, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_randperm_large_cuda, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_uniform_from_to_cuda_bfloat16, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_uniform_from_to_cuda_float16, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_uniform_from_to_cuda_float32, test/test_tensor_creation_ops.py::TestRandomTensorCreationCUDA::test_uniform_from_to_cuda_float64, test/test_tensor_creation_ops.py::TestLikeTensorCreationCUDA::test_empty_like_cuda, test/test_tensor_creation_ops.py::TestLikeTensorCreationCUDA::test_full_like_inference_cuda, test/test_tensor_creation_ops.py::TestLikeTensorCreationCUDA::test_ones_like_cuda, test/test_tensor_creation_ops.py::TestLikeTensorCreationCUDA::test_ones_like_multiple_device_cuda, test/test_tensor_creation_ops.py::TestLikeTensorCreationCUDA::test_zeros_like_cuda, test/test_tensor_creation_ops.py::TestLikeTensorCreationCUDA::test_zeros_like_multiple_device_cuda, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_bool, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_complex128, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_float16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_float64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_int16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_int32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_int64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_int8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_uint16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_uint32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_uint64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_buffer_cuda_uint8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_dlpack_cuda_bfloat16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_dlpack_cuda_complex128, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_dlpack_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_dlpack_cuda_float16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_dlpack_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_dlpack_cuda_float64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_dlpack_cuda_int16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_dlpack_cuda_int32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_dlpack_cuda_int64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_dlpack_cuda_int8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_dlpack_cuda_uint8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_bool, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_complex128, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_float16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_float64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_int16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_int32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_int64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_int8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_uint16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_uint32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_uint64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_numpy_cuda_uint8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_bfloat16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_bool, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_complex128, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_float16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_float64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_int16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_int32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_int64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_int8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_alias_from_tensor_cuda_uint8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_astensor_consistency_cuda, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_bool, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_complex128, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_float16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_float64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_int16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_int32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_int64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_int8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_uint16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_uint32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_uint64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_buffer_cuda_uint8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_cuda_bfloat16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_cuda_complex128, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_cuda_float16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_cuda_float64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_cuda_int16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_cuda_int32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_cuda_int64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_cuda_int8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_cuda_uint8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_mult_devices_cuda_bfloat16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_mult_devices_cuda_complex128, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_mult_devices_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_mult_devices_cuda_float16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_mult_devices_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_mult_devices_cuda_float64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_mult_devices_cuda_int16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_mult_devices_cuda_int32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_mult_devices_cuda_int64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_mult_devices_cuda_int8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_dlpack_mult_devices_cuda_uint8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_bool, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_complex128, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_float16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_float64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_int16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_int32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_int64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_int8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_uint16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_uint32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_uint64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_numpy_cuda_uint8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_tensor_mult_devices_cuda_bfloat16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_tensor_mult_devices_cuda_complex128, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_tensor_mult_devices_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_tensor_mult_devices_cuda_float16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_tensor_mult_devices_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_tensor_mult_devices_cuda_float64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_tensor_mult_devices_cuda_int16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_tensor_mult_devices_cuda_int32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_tensor_mult_devices_cuda_int64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_tensor_mult_devices_cuda_int8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_from_tensor_mult_devices_cuda_uint8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_bfloat16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_bool, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_complex128, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_float16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_float64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_int16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_int32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_int64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_int8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_list_cuda_uint8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_bfloat16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_bool, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_complex128, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_float16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_float64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_int16, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_int32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_int64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_int8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_copy_tensor_cuda_uint8, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_default_device_cuda, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_device_without_index_cuda, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_numpy_scalars_cuda, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_retain_autograd_history_cuda_complex64, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_retain_autograd_history_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_unsupported_alias_cuda_float32, test/test_tensor_creation_ops.py::TestAsArrayCUDA::test_unsupported_alias_mult_devices_cuda_float32 2025-09-07T06:53:21.1804299Z 2025-09-07T06:53:21.1804488Z Running test_autograd_fallback 1/1 ... [2025-09-07 06:53:21.149911] 2025-09-07T06:53:21.1804850Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:53:21.1805737Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autograd_fallback.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:53:21.150214] 2025-09-07T06:53:24.9201555Z 2025-09-07T06:53:24.9202835Z test_autograd_fallback 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autograd_fallback_1.1_99eb61685093db20_.log 2025-09-07T06:53:24.9216983Z Running 28 items in this shard: test/test_autograd_fallback.py::TestAutogradFallback::test_autograd_function_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_autograd_function_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_base_does_not_require_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_base_does_not_require_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_composite_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_composite_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_cpu_return_self_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_cpu_return_self_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_autograd_function_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_autograd_function_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_on_tensor_that_does_not_require_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_on_tensor_that_does_not_require_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_inplace_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_inplace_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_leaf_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_leaf_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_mix_of_requires_grad_tensors_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_mix_of_requires_grad_tensors_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_supports_tensor_lists_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_supports_tensor_lists_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_grads_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_grads_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_inputs_outputs_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_inputs_outputs_mode_warn 2025-09-07T06:53:24.9226525Z 2025-09-07T06:53:24.9226718Z Running dynamo/test_fake_distributed 1/1 ... [2025-09-07 06:53:24.920494] 2025-09-07T06:53:24.9227311Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:53:24.9228203Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fake_distributed.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:53:24.920797] 2025-09-07T06:53:33.9481665Z 2025-09-07T06:53:33.9482860Z dynamo/test_fake_distributed 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fake_distributed_1.1_54247f9d69a8a834_.log 2025-09-07T06:53:33.9485140Z Running 2 items in this shard: test/dynamo/test_fake_distributed.py::TestFakeDistributed::test_all_to_all_single_autograd, test/dynamo/test_fake_distributed.py::TestFakeDistributed::test_device_mesh_get_local_rank 2025-09-07T06:53:33.9486374Z 2025-09-07T06:53:33.9487032Z Running inductor/test_distributed_patterns 1/1 ... [2025-09-07 06:53:33.948476] 2025-09-07T06:53:33.9487734Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:53:33.9490698Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_distributed_patterns.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:53:33.948789] 2025-09-07T06:53:40.7730921Z 2025-09-07T06:53:40.7732561Z inductor/test_distributed_patterns 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_distributed_patterns_1.1_31370cb6520c4d97_.log 2025-09-07T06:53:40.7744104Z Running 20 items in this shard: test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_fake_distributed_aot_eager, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_fake_distributed_inductor, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_intermediate_hook_with_closure, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_intermediate_hook_with_nested_closure, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_module_backward_hooks_aot, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_module_backward_hooks_eager, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_module_backward_hooks_inductor, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_module_backward_hooks_multi_layers, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_nn_param_return1, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_nn_param_return2, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_nn_param_return3, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_nn_param_return4, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_storage_resize_nonzero_cpu, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_storage_resize_nonzero_gpu, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_storage_resize_zero_cpu, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_storage_resize_zero_gpu, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_unsafe_preserve_version_counter1, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_unsafe_preserve_version_counter2, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_unsafe_set_version_counter1, test/inductor/test_distributed_patterns.py::DistributedPatternTests::test_unsafe_set_version_counter2 2025-09-07T06:53:40.7758322Z 2025-09-07T06:53:40.7758491Z Running test_autocast 1/1 ... [2025-09-07 06:53:40.773471] 2025-09-07T06:53:40.7758831Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:53:40.7759914Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:53:40.773779] 2025-09-07T06:53:44.4935519Z 2025-09-07T06:53:44.4936492Z test_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autocast_1.1_17ae7002b1bbd165_.log 2025-09-07T06:53:44.4944654Z Running 20 items in this shard: test/test_autocast.py::TestAutocastCPU::test_autocast_disabled_with_fp32_dtype, test/test_autocast.py::TestAutocastCPU::test_autocast_methods_expect_builtin_promote, test/test_autocast.py::TestAutocastCPU::test_autocast_nn_16, test/test_autocast.py::TestAutocastCPU::test_autocast_nn_fp32, test/test_autocast.py::TestAutocastCPU::test_autocast_rnn, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_16, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_expect_builtin_promote, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_fp32, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_need_autocast_promote, test/test_autocast.py::TestAutocastCPU::test_cpu_autocast_deprecated_warning, test/test_autocast.py::TestAutocastCPU::test_generic_autocast, test/test_autocast.py::TestAutocastGPU::test_autocast_prioritize, test/test_autocast.py::TestAutocastGPU::test_cache_disabled, test/test_autocast.py::TestAutocastGPU::test_cast_cache_is_global, test/test_autocast.py::TestAutocastMPS::test_cast_cache_is_global, test/test_autocast.py::TestAutocastMPS::test_mps_autocast_bfloat16_supported, test/test_autocast.py::TestAutocastMPS::test_mps_autocast_error_message, test/test_autocast.py::TestTorchAutocast::test_autocast_fast_dtype, test/test_autocast.py::TestTorchAutocast::test_invalid_device, test/test_autocast.py::TestTorchAutocast::test_non_string_device 2025-09-07T06:53:44.4949678Z 2025-09-07T06:53:44.4949825Z Running test_torch 1/1 ... [2025-09-07 06:53:44.494006] 2025-09-07T06:53:44.4950140Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:53:44.4950977Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_torch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:53:44.494440] 2025-09-07T06:54:21.1096137Z 2025-09-07T06:54:21.1097327Z test_torch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_torch_1.1_339cfe4f87386f2d_.log 2025-09-07T06:54:21.1351195Z Running 976 items in this shard: test/test_torch.py::TestBasicVitalSigns::test_basic_vitals, test/test_torch.py::TestBasicVitalSigns::test_basic_vitals_read_write, test/test_torch.py::TestBasicVitalSigns::test_dataloader_vitals, test/test_torch.py::TestTorch::test_RNGState, test/test_torch.py::TestTorch::test_RNGStateAliasing, test/test_torch.py::TestTorch::test_RNG_after_pickle, test/test_torch.py::TestTorch::test_Size, test/test_torch.py::TestTorch::test_Size_concat_non_tuple_sequence, test/test_torch.py::TestTorch::test_Size_concat_wildcard, test/test_torch.py::TestTorch::test_Size_iter, test/test_torch.py::TestTorch::test_Size_scalar, test/test_torch.py::TestTorch::test_add_meta_scalar, test/test_torch.py::TestTorch::test_allow_tensor_metadata_change, test/test_torch.py::TestTorch::test_apply, test/test_torch.py::TestTorch::test_as_subclass, test/test_torch.py::TestTorch::test_assert_async, test/test_torch.py::TestTorch::test_backward_hooks_traverse, test/test_torch.py::TestTorch::test_batch_norm_cpu_inference, test/test_torch.py::TestTorch::test_bf16_supported_on_cpu, test/test_torch.py::TestTorch::test_bmm_multithreaded, test/test_torch.py::TestTorch::test_boxMullerState, test/test_torch.py::TestTorch::test_cat_neg_dim, test/test_torch.py::TestTorch::test_check, test/test_torch.py::TestTorch::test_chunk_neg_dim, test/test_torch.py::TestTorch::test_conj_neg_tolist, test/test_torch.py::TestTorch::test_conj_physical_meta_stride, test/test_torch.py::TestTorch::test_contains, test/test_torch.py::TestTorch::test_copy_broadcast, test/test_torch.py::TestTorch::test_copy_dtypes, test/test_torch.py::TestTorch::test_copy_float16, test/test_torch.py::TestTorch::test_copy_many_to_one, test/test_torch.py::TestTorch::test_copy_transpose, test/test_torch.py::TestTorch::test_cuda_not_built, test/test_torch.py::TestTorch::test_cummax_neg_dim, test/test_torch.py::TestTorch::test_cummin_neg_dim, test/test_torch.py::TestTorch::test_cumprod_neg_dim, test/test_torch.py::TestTorch::test_cumsum_neg_dim, test/test_torch.py::TestTorch::test_cxx_flags, test/test_torch.py::TestTorch::test_data_ptr_of_empty_tensor_with_storage, test/test_torch.py::TestTorch::test_data_ptr_of_empty_view_with_storage, test/test_torch.py::TestTorch::test_deepcopy_gradient, test/test_torch.py::TestTorch::test_deepcopy_parameter, test/test_torch.py::TestTorch::test_deterministic_fill_uninitialized_memory, test/test_torch.py::TestTorch::test_deterministic_flag, test/test_torch.py::TestTorch::test_device, test/test_torch.py::TestTorch::test_dim_order, test/test_torch.py::TestTorch::test_dir, test/test_torch.py::TestTorch::test_doc, test/test_torch.py::TestTorch::test_doc_template, test/test_torch.py::TestTorch::test_dot_data_use, test/test_torch.py::TestTorch::test_dtype_is_signed, test/test_torch.py::TestTorch::test_element_size, test/test_torch.py::TestTorch::test_empty_meta, test/test_torch.py::TestTorch::test_empty_storage_view, test/test_torch.py::TestTorch::test_equal, test/test_torch.py::TestTorch::test_error_msg_type_translation, test/test_torch.py::TestTorch::test_fill_diagonal, test/test_torch.py::TestTorch::test_format_scalar_meta, test/test_torch.py::TestTorch::test_from_buffer, test/test_torch.py::TestTorch::test_from_file, test/test_torch.py::TestTorch::test_gather_neg_dim, test/test_torch.py::TestTorch::test_generator_cpu, test/test_torch.py::TestTorch::test_get_cpu_capability, test/test_torch.py::TestTorch::test_has_internal_overlap, test/test_torch.py::TestTorch::test_has_storage, test/test_torch.py::TestTorch::test_index_add, test/test_torch.py::TestTorch::test_index_add_all_dtypes, test/test_torch.py::TestTorch::test_index_add_cornercase, test/test_torch.py::TestTorch::test_index_add_correctness, test/test_torch.py::TestTorch::test_index_add_neg_dim, test/test_torch.py::TestTorch::test_index_copy_neg_dim, test/test_torch.py::TestTorch::test_index_fill_neg_dim, test/test_torch.py::TestTorch::test_index_select_neg_dim, test/test_torch.py::TestTorch::test_invalid_arg_error_handling, test/test_torch.py::TestTorch::test_invalid_generator_raises, test/test_torch.py::TestTorch::test_is_nonzero, test/test_torch.py::TestTorch::test_is_same_size, test/test_torch.py::TestTorch::test_iter, test/test_torch.py::TestTorch::test_kthvalue_neg_dim, test/test_torch.py::TestTorch::test_linspace_logspace, test/test_torch.py::TestTorch::test_logcumsumexp_neg_dim, test/test_torch.py::TestTorch::test_manual_seed, test/test_torch.py::TestTorch::test_map, test/test_torch.py::TestTorch::test_map2, test/test_torch.py::TestTorch::test_max_neg_dim, test/test_torch.py::TestTorch::test_mean_neg_dim, test/test_torch.py::TestTorch::test_median_neg_dim, test/test_torch.py::TestTorch::test_memory_format, test/test_torch.py::TestTorch::test_memory_format_contiguous_returns_same_tensor_if_already_satisfies, test/test_torch.py::TestTorch::test_memory_format_empty, test/test_torch.py::TestTorch::test_min_neg_dim, test/test_torch.py::TestTorch::test_mode_neg_dim, test/test_torch.py::TestTorch::test_multinomial_invalid_probs, test/test_torch.py::TestTorch::test_nanmedian_neg_dim, test/test_torch.py::TestTorch::test_narrow_neg_dim, test/test_torch.py::TestTorch::test_nbytes, test/test_torch.py::TestTorch::test_ndim, test/test_torch.py::TestTorch::test_new, test/test_torch.py::TestTorch::test_newaxis_numpy_comparison, test/test_torch.py::TestTorch::test_newindex, test/test_torch.py::TestTorch::test_no_cuda_monkeypatch, test/test_torch.py::TestTorch::test_norm_neg_dim, test/test_torch.py::TestTorch::test_normal_shape, test/test_torch.py::TestTorch::test_numel, test/test_torch.py::TestTorch::test_parallel_info, test/test_torch.py::TestTorch::test_parsing_double, test/test_torch.py::TestTorch::test_parsing_int64, test/test_torch.py::TestTorch::test_parsing_intlist, test/test_torch.py::TestTorch::test_permute, test/test_torch.py::TestTorch::test_pickle, test/test_torch.py::TestTorch::test_pickle_dtype, test/test_torch.py::TestTorch::test_pickle_function, test/test_torch.py::TestTorch::test_pickle_generator, test/test_torch.py::TestTorch::test_pickle_parameter, test/test_torch.py::TestTorch::test_pickle_parameter_no_requires_grad, test/test_torch.py::TestTorch::test_pickle_size, test/test_torch.py::TestTorch::test_pin_memory, test/test_torch.py::TestTorch::test_print, test/test_torch.py::TestTorch::test_prod_neg_dim, test/test_torch.py::TestTorch::test_pyobj_preserved, test/test_torch.py::TestTorch::test_qengine, test/test_torch.py::TestTorch::test_renorm_neg_dim, test/test_torch.py::TestTorch::test_resizable, test/test_torch.py::TestTorch::test_reversed, test/test_torch.py::TestTorch::test_scatter_neg_dim, test/test_torch.py::TestTorch::test_select_neg_dim, test/test_torch.py::TestTorch::test_set_flush_denormal, test/test_torch.py::TestTorch::test_setting_real_imag_to_a_number, test/test_torch.py::TestTorch::test_show_config, test/test_torch.py::TestTorch::test_size_neg_dim, test/test_torch.py::TestTorch::test_size_stride, test/test_torch.py::TestTorch::test_sizeof, test/test_torch.py::TestTorch::test_slice, test/test_torch.py::TestTorch::test_slow_test, test/test_torch.py::TestTorch::test_sobolengine_bounds, test/test_torch.py::TestTorch::test_sobolengine_bounds_scrambled, test/test_torch.py::TestTorch::test_sobolengine_continuing, test/test_torch.py::TestTorch::test_sobolengine_continuing_scrambled, test/test_torch.py::TestTorch::test_sobolengine_default_dtype, test/test_torch.py::TestTorch::test_sobolengine_distribution, test/test_torch.py::TestTorch::test_sobolengine_distribution_scrambled, test/test_torch.py::TestTorch::test_sobolengine_draw, test/test_torch.py::TestTorch::test_sobolengine_draw_base2, test/test_torch.py::TestTorch::test_sobolengine_draw_base2_scrambled, test/test_torch.py::TestTorch::test_sobolengine_draw_scrambled, test/test_torch.py::TestTorch::test_sobolengine_fast_forward, test/test_torch.py::TestTorch::test_sobolengine_fast_forward_scrambled, test/test_torch.py::TestTorch::test_sobolengine_first_point, test/test_torch.py::TestTorch::test_sobolengine_high_dim, test/test_torch.py::TestTorch::test_sobolengine_raise, test/test_torch.py::TestTorch::test_sobolengine_reset, test/test_torch.py::TestTorch::test_sobolengine_reset_scrambled, test/test_torch.py::TestTorch::test_sort_neg_dim, test/test_torch.py::TestTorch::test_split_neg_dim, test/test_torch.py::TestTorch::test_split_with_sizes_copy_out, test/test_torch.py::TestTorch::test_squeeze_neg_dim, test/test_torch.py::TestTorch::test_std_neg_dim, test/test_torch.py::TestTorch::test_storage_base_init, test/test_torch.py::TestTorch::test_storage_base_new, test/test_torch.py::TestTorch::test_storage_byteswap, test/test_torch.py::TestTorch::test_storage_casts, test/test_torch.py::TestTorch::test_storage_cycle_via_dict, test/test_torch.py::TestTorch::test_storage_cycle_via_slots, test/test_torch.py::TestTorch::test_storage_dead_weak_ref, test/test_torch.py::TestTorch::test_storage_dealloc, test/test_torch.py::TestTorch::test_storage_dealloc_resurrected, test/test_torch.py::TestTorch::test_storage_dealloc_subclass_resurrected, test/test_torch.py::TestTorch::test_storage_dealloc_subclass_zombie, test/test_torch.py::TestTorch::test_storage_dict_dealloc, test/test_torch.py::TestTorch::test_storage_error, test/test_torch.py::TestTorch::test_storage_error_no_attribute, test/test_torch.py::TestTorch::test_storage_finalizer_dealloc, test/test_torch.py::TestTorch::test_storage_fix_weakref_no_leak, test/test_torch.py::TestTorch::test_storage_from_tensor_dealloc, test/test_torch.py::TestTorch::test_storage_from_tensor_dealloc_resurrected, test/test_torch.py::TestTorch::test_storage_from_tensor_dealloc_zombie, test/test_torch.py::TestTorch::test_storage_preserve_nonhermetic_in_hermetic_context, test/test_torch.py::TestTorch::test_storage_resurrected_weak_ref, test/test_torch.py::TestTorch::test_storage_slot_dealloc, test/test_torch.py::TestTorch::test_storage_weakref_dealloc, test/test_torch.py::TestTorch::test_structseq_repr, test/test_torch.py::TestTorch::test_subclass_preserved, test/test_torch.py::TestTorch::test_subclass_tensors, test/test_torch.py::TestTorch::test_sum_neg_dim, test/test_torch.py::TestTorch::test_swap_basic, test/test_torch.py::TestTorch::test_swap_fail_slots, test/test_torch.py::TestTorch::test_t_not_2d_error, test/test_torch.py::TestTorch::test_tensor_base_init, test/test_torch.py::TestTorch::test_tensor_base_new, test/test_torch.py::TestTorch::test_tensor_ctor_scalar, test/test_torch.py::TestTorch::test_tensor_cycle_via_dict, test/test_torch.py::TestTorch::test_tensor_cycle_via_slots, test/test_torch.py::TestTorch::test_tensor_dead_weak_ref, test/test_torch.py::TestTorch::test_tensor_dict_dealloc, test/test_torch.py::TestTorch::test_tensor_finalizer_dealloc, test/test_torch.py::TestTorch::test_tensor_fix_weakref_no_leak, test/test_torch.py::TestTorch::test_tensor_item_no_warning, test/test_torch.py::TestTorch::test_tensor_ressurecting_clear, test/test_torch.py::TestTorch::test_tensor_resurrected_weak_ref, test/test_torch.py::TestTorch::test_tensor_set, test/test_torch.py::TestTorch::test_tensor_set_errors, test/test_torch.py::TestTorch::test_tensor_slot_dealloc, test/test_torch.py::TestTorch::test_tensor_weakref_dealloc, test/test_torch.py::TestTorch::test_tensor_where_scalar, test/test_torch.py::TestTorch::test_tensor_with_grad_to_scalar_warning, test/test_torch.py::TestTorch::test_tensoriterator_output_setup, test/test_torch.py::TestTorch::test_terminate_handler_on_crash, test/test_torch.py::TestTorch::test_to, test/test_torch.py::TestTorch::test_to_with_tensor, test/test_torch.py::TestTorch::test_topk_neg_dim, test/test_torch.py::TestTorch::test_torch_from_file, test/test_torch.py::TestTorch::test_transpose_neg_dim, test/test_torch.py::TestTorch::test_type, test/test_torch.py::TestTorch::test_type_alias, test/test_torch.py::TestTorch::test_type_conversion_via_dtype_name, test/test_torch.py::TestTorch::test_typed_storage_deprecation_warning, test/test_torch.py::TestTorch::test_typed_storage_internal_no_warning, test/test_torch.py::TestTorch::test_unbind_neg_dim, test/test_torch.py::TestTorch::test_unflatten, test/test_torch.py::TestTorch::test_unfold_neg_dim, test/test_torch.py::TestTorch::test_unsqueeze_neg_dim, test/test_torch.py::TestTorch::test_upsample_nearest1d_meta, test/test_torch.py::TestTorch::test_upsample_nearest2d_meta, test/test_torch.py::TestTorch::test_var_neg_dim, test/test_torch.py::TestTorch::test_warn_types, test/test_torch.py::TestTorch::test_wildcard_import, test/test_torch.py::TestVitalSignsCudaCUDA::test_cuda_vitals_gpu_only_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test__local_scalar_dense_with_empty_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_cuda_errors_with_cpu_scalars_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_assertRaisesRegex_ignore_msg_non_native_device_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_edge_cases_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_edge_cases_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_edge_cases_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_p_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_p_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_p_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bfloat16_neg_abs_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bool_tensor_value_change_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_add_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_addcdiv_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_addcmul_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_atan2_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_copy_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_dist_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_div_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_eq_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_fmod_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_ge_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_gt_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_le_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_lerp_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_lt_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_map2_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_map_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_masked_fill_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_masked_scatter_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_masked_select_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_max_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_min_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_mul_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_ne_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_pow_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_remainder_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_sub_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_kstest_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_no_inf_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_no_inf_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_cuda_backward_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_euclidean_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_grad_p_lt_1_no_nan_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_large_batch_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_non_contiguous_batch_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_non_contiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_norm_batch_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_norm_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_same_inputs_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_check_tensor_all_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_check_tensor_internal_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_clone_all_dtypes_and_devices_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_clone_not_memory_dense_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_clone_zero_stride_dim_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_complex_half_experimental_warning_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_constants_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_conv_transposed_backward_agnostic_to_memory_format_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_conv_transposed_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_complex32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_all_dtypes_and_devices_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_math_view_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_mem_overlap_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_transpose_math_view_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_transpose_math_view_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_transpose_math_view_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_corrcoef_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_corrcoef_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_corrcoef_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cov_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cpp_warnings_have_python_context_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cublas_config_nondeterministic_alert_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cummax_cummin_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cummax_discontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cummin_discontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cumprod_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cumsum_64bit_indexing_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cumsum_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_scalar_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_scalar_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_cumsum_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_complex32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_interpolate_bilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_replication_pad2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_device_guard_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_dim_function_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_discontiguous_out_cumsum_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_dist_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_dtypetensor_warnings_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_expected_failure_xla_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_no_zero_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_no_zero_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gather_backward_deterministic_path_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gather_backward_one_dim_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_kstest_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scale_will_not_overflow_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaler_deprecated_warning_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaler_pass_itself_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_accumulation_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach0_fused0_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach0_fused0_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach0_fused0_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach2_fused_True_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach2_fused_True_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach2_fused_True_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach_True_fused1_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach_True_fused1_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach_True_fused1_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_clipping_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_clipping_separate_unscale_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_multiple_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_penalty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_state_dict_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_unscale_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_unscale_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_unscale_sparse_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_update_scale_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_extreme_cases_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_extreme_cases_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_extreme_cases_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_spacing_list_length_error_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_spacing_list_length_error_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_spacing_list_length_error_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_type_promotion_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_hook_remove_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_add_large_inputs_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_add_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_copy_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_fill_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_put_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_int64_upsample3d_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_invalid_shapes_grid_sampler_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_is_set_to_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_is_signed_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_complex32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e4m3fn, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e4m3fnuz, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e5m2, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e5m2fnuz, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_large_cumprod_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_large_cumsum_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_logcumsumexp_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lognormal_kstest_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_bool_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bfloat16_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bfloat16_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bool_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bool_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex128_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex128_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex64_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex64_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float16_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float16_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float32_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float32_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float64_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float64_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int16_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int16_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int32_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int32_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int64_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int64_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int8_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int8_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_uint8_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_uint8_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_bool_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_inplace_noncontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_large_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_discontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_clone_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_consistency_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_cpu_and_cuda_ops_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_empty_like_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_factory_like_functions_preserve_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_operators_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_preserved_after_permute_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_propagation_rules_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_to_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_type_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_type_shortcuts_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_module_share_memory_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cpu_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cpu_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cpu_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_deterministic_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_deterministic_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_deterministic_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_device_constrain_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_empty_w_replacement_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_empty_wo_replacement_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_gpu_device_constrain_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_rng_state_advance_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_narrow_copy_non_contiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_narrow_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_no_nondeterministic_alert_interpolate_bilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_no_nondeterministic_alert_interpolate_trilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AdaptiveAvgPool2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AdaptiveAvgPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AdaptiveMaxPool2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AvgPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_CTCLoss_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_EmbeddingBag_max_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_FractionalMaxPool2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_FractionalMaxPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool1d_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool1d_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool1d_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool2d_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool2d_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool2d_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool3d_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool3d_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool3d_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_NLLLoss_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReflectionPad1d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReflectionPad3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReplicationPad1d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReplicationPad2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReplicationPad3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_bincount_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_grid_sample_2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_grid_sample_3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_histc_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_histc_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_bicubic_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_bilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_linear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_trilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_kthvalue_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_median_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_put_accumulate_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_put_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_qint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_qint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_quint2x4, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_quint4x2, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_quint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nullary_op_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pairwise_distance_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_and_graph_partition_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_and_graph_partition_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_and_graph_partition_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_between_unscale_and_step_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_between_unscale_and_step_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_between_unscale_and_step_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pdist_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pdist_norm_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pickle_gradscaler_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pin_memory_from_constructor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_reduced_type_float_copy_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_reduced_type_float_copy_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_repeat_interleave_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scalar_check_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_bool_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_non_unique_index_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_one_dim_deterministic_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_to_large_input_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_bool_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_multiply_unsupported_dtypes_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_multiply_unsupported_dtypes_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_to_large_input_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_zero_size_index_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_serialization_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_default_tensor_type_warnings_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_shift_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_skip_xla_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_all_devices_non_blocking_False_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_all_devices_non_blocking_True_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_qint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_qint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_quint4x2, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_quint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_use_count_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_strides_propagation_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_sync_warning_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_set_errors_multigpu_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_shape_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_type_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_ternary_op_mem_overlap_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_untyped_storage_meta_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_warn_always_caught_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_where_scalar_handcrafted_values_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_advancedindex_mixed_cpu_devices_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_advancedindex_mixed_devices_error_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_clamp_cuda_float32, test/test_torch.py::TestDevicePrecisionCUDA::test_clamp_cuda_float64, test/test_torch.py::TestDevicePrecisionCUDA::test_clamp_cuda_int64, test/test_torch.py::TestDevicePrecisionCUDA::test_copy_broadcast_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_copy_noncontig_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_cuda_device_idx_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_device_serialization_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_float16, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_float32, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_float64, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int16, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int32, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int64, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int8, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_uint8, test/test_torch.py::TestDevicePrecisionCUDA::test_index_add_bfloat16_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_multidevice_serialization_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_type_conversions_same_device_cuda 2025-09-07T06:54:21.1584896Z 2025-09-07T06:54:21.1585138Z Running functorch/test_memory_efficient_fusion 1/1 ... [2025-09-07 06:54:21.111173] 2025-09-07T06:54:21.1585555Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:54:21.1586502Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_memory_efficient_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:54:21.111495] 2025-09-07T06:54:25.0818605Z 2025-09-07T06:54:25.0819849Z functorch/test_memory_efficient_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_memory_efficient_fusion_1.1_ef9321c79bbeddbf_.log 2025-09-07T06:54:25.0833003Z Running 22 items in this shard: test/functorch/test_memory_efficient_fusion.py::TestMemoryEfficientOpAuthoring::test_gelu_bias, test/functorch/test_memory_efficient_fusion.py::TestMemoryEfficientOpAuthoring::test_hard_sigmoid, test/functorch/test_memory_efficient_fusion.py::TestMemoryEfficientOpAuthoring::test_hard_swish, test/functorch/test_memory_efficient_fusion.py::TestMemoryEfficientOpAuthoring::test_layer_norm, test/functorch/test_memory_efficient_fusion.py::TestMemoryEfficientOpAuthoring::test_mish, test/functorch/test_memory_efficient_fusion.py::TestMemoryEfficientOpAuthoring::test_rmsnorm, test/functorch/test_memory_efficient_fusion.py::TestMemoryEfficientOpAuthoring::test_swish, test/functorch/test_memory_efficient_fusion.py::NoChangeTestCase::test_empty, test/functorch/test_memory_efficient_fusion.py::NoChangeTestCase::test_hash_with_numbers, test/functorch/test_memory_efficient_fusion.py::NoChangeTestCase::test_nochange, test/functorch/test_memory_efficient_fusion.py::NoChangeTestCase::test_rand_like, test/functorch/test_memory_efficient_fusion.py::NoChangeTestCase::test_rand_n, test/functorch/test_memory_efficient_fusion.py::ReduceTestCase::test_immutable_list_multiple_entries, test/functorch/test_memory_efficient_fusion.py::ReduceTestCase::test_immutable_list_type, test/functorch/test_memory_efficient_fusion.py::ReduceTestCase::test_kwarg, test/functorch/test_memory_efficient_fusion.py::ReduceTestCase::test_nested_immutable_list_type, test/functorch/test_memory_efficient_fusion.py::ReduceTestCase::test_simple, test/functorch/test_memory_efficient_fusion.py::ReduceTestCase::test_simple_2, test/functorch/test_memory_efficient_fusion.py::ReduceTestCase::test_simple_multiple_same_ops, test/functorch/test_memory_efficient_fusion.py::ReduceTestCase::test_two_args, test/functorch/test_memory_efficient_fusion.py::ReduceTestCase::test_two_args_default, test/functorch/test_memory_efficient_fusion.py::RandomOpTestCase::test_random 2025-09-07T06:54:25.0839734Z 2025-09-07T06:54:25.0839909Z Running test_sort_and_select 1/1 ... [2025-09-07 06:54:25.082242] 2025-09-07T06:54:25.0840257Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:54:25.0841327Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sort_and_select.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:54:25.082539] 2025-09-07T06:54:29.4035438Z 2025-09-07T06:54:29.4036855Z test_sort_and_select 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sort_and_select_1.1_bbe335f8bfc99432_.log 2025-09-07T06:54:29.4070140Z Running 111 items in this shard: test/test_sort_and_select.py::TestSortAndSelectCUDA::test_complex_unsupported_cpu_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_dtypes_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_kthvalue_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_kthvalue_scalar_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_output_discontiguous_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_discontiguous_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_discontiguous_slow_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_expanded_tensor_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_large_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_large_slice_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_restride_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_stable_none_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_1d_output_discontiguous_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_4d_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_arguments_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_lower_precision_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_lower_precision_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_noncontiguous_gpu_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_quantized_scalar_input_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_dim_cuda 2025-09-07T06:54:29.4099631Z 2025-09-07T06:54:29.4099821Z Running test_cpp_extensions_jit 1/1 ... [2025-09-07 06:54:29.403813] 2025-09-07T06:54:29.4100192Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:54:29.4101061Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_extensions_jit.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:54:29.404123] 2025-09-07T06:54:33.1743028Z 2025-09-07T06:54:33.1744758Z test_cpp_extensions_jit 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_extensions_jit_1.1_32f473b115bb2924_.log 2025-09-07T06:54:33.1757804Z Running 34 items in this shard: test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_aoti_torch_call_dispatcher, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_autograd_from_cpp, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_compilation_error_formatting, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_has_same_output_as_python, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_has_up_to_date_attributes, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_python_inter_op, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_python_inter_op_with_cuda, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cuda_arch_flags_default_gencode, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cuda_arch_flags_non_default_gencode, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cuda_pluggable_allocator_include, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_custom_compound_op_autograd, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_custom_functorch_error, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_gen_extension_h_pch, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_half_support, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_custom_op_cuda, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_cuda, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_multiple_sources_and_no_functions, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_throws_when_functions_is_bad, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_with_functions_as_dict, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_with_functions_as_list, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_xpu, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_compile_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_cuda_archflags, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_cuda_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_cudnn_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_xpu_archlists, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_xpu_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_lenient_flag_handling_in_jit_extensions, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_load_with_non_platform_default_encoding, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_mps_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_reload_jit_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_returns_shared_library_path_when_is_python_module_is_true, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_set_default_type_also_changes_aten_default_type, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_warning 2025-09-07T06:54:33.1768064Z 2025-09-07T06:54:33.1768229Z Running test_native_mha 1/1 ... [2025-09-07 06:54:33.174536] 2025-09-07T06:54:33.1768561Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:54:33.1769392Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_native_mha.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:54:33.174834] 2025-09-07T06:54:37.2952243Z 2025-09-07T06:54:37.2953525Z test_native_mha 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_native_mha_1.1_cbce4a870c43ff4e_.log 2025-09-07T06:54:37.2989859Z Running 54 items in this shard: test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_attention_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_attention_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_encoder_decoder_attention_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_encoder_decoder_attention_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_transform_bias_rescale_qkv_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_transform_bias_rescale_qkv_nested_cuda_float32 2025-09-07T06:54:37.3021224Z 2025-09-07T06:54:37.3021412Z Running test_cuda_primary_ctx 1/1 ... [2025-09-07 06:54:37.295567] 2025-09-07T06:54:37.3021795Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:54:37.3022703Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_primary_ctx.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:54:37.295860] 2025-09-07T06:54:56.1366833Z 2025-09-07T06:54:56.1368113Z test_cuda_primary_ctx 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_primary_ctx_1.1_f34c5e1631f49891_.log 2025-09-07T06:54:56.1370667Z Running 4 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_copy, test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_pin_memory, test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_set_device_0, test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_str_repr 2025-09-07T06:54:56.1372833Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_copy 2025-09-07T06:54:56.1373993Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_pin_memory 2025-09-07T06:54:56.1375133Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_set_device_0 2025-09-07T06:54:56.1376350Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_str_repr 2025-09-07T06:54:56.1377083Z 2025-09-07T06:54:56.1377368Z Running test_nn 1/1 ... [2025-09-07 06:54:56.137087] 2025-09-07T06:54:56.1377979Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:54:56.1379219Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nn.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:54:56.137423] 2025-09-07T06:56:10.9541089Z 2025-09-07T06:56:10.9543209Z test_nn 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_nn_1.1_dbd0853227cfdfeb_.log 2025-09-07T06:56:11.0494655Z Running 2254 items in this shard: test/test_nn.py::TestNN::test_AdaptiveLogSoftmax, test/test_nn.py::TestNN::test_AdaptiveLogSoftmax_cuda, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_BCELoss_no_reduce, test/test_nn.py::TestNN::test_BCELoss_no_reduce_cuda, test/test_nn.py::TestNN::test_BCELoss_no_reduce_scalar, test/test_nn.py::TestNN::test_BCELoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce_cuda, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce_scalar, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_legacy_enum, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_legacy_enum_cuda, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce_scalar, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_CELU_no_batch_dim, test/test_nn.py::TestNN::test_CELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_CTCLoss_critical_target_len, test/test_nn.py::TestNN::test_CTCLoss_lengthchecks_cpu, test/test_nn.py::TestNN::test_CTCLoss_lengthchecks_cuda, test/test_nn.py::TestNN::test_CTCLoss_long_targets, test/test_nn.py::TestNN::test_CTCLoss_typechecks, test/test_nn.py::TestNN::test_CTCLoss_zero_infinity, test/test_nn.py::TestNN::test_CTCLoss_zero_lengths, test/test_nn.py::TestNN::test_Conv1d, test/test_nn.py::TestNN::test_Conv1d_circular_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_circular_stride2_pad2_cuda, test/test_nn.py::TestNN::test_Conv1d_cuda, test/test_nn.py::TestNN::test_Conv1d_dilated, test/test_nn.py::TestNN::test_Conv1d_dilated_cuda, test/test_nn.py::TestNN::test_Conv1d_groups, test/test_nn.py::TestNN::test_Conv1d_groups_cuda, test/test_nn.py::TestNN::test_Conv1d_pad1, test/test_nn.py::TestNN::test_Conv1d_pad1_cuda, test/test_nn.py::TestNN::test_Conv1d_pad1size1, test/test_nn.py::TestNN::test_Conv1d_pad1size1_cuda, test/test_nn.py::TestNN::test_Conv1d_pad2, test/test_nn.py::TestNN::test_Conv1d_pad2_cuda, test/test_nn.py::TestNN::test_Conv1d_pad2size1, test/test_nn.py::TestNN::test_Conv1d_pad2size1_cuda, test/test_nn.py::TestNN::test_Conv1d_pad_same, test/test_nn.py::TestNN::test_Conv1d_pad_same2, test/test_nn.py::TestNN::test_Conv1d_pad_same2_cuda, test/test_nn.py::TestNN::test_Conv1d_pad_same_cuda, test/test_nn.py::TestNN::test_Conv1d_pad_same_dilated, test/test_nn.py::TestNN::test_Conv1d_pad_same_dilated_cuda, test/test_nn.py::TestNN::test_Conv1d_pad_valid, test/test_nn.py::TestNN::test_Conv1d_pad_valid_cuda, test/test_nn.py::TestNN::test_Conv1d_reflect_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_reflect_stride2_pad2_cuda, test/test_nn.py::TestNN::test_Conv1d_replicate_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_replicate_stride2_pad2_cuda, test/test_nn.py::TestNN::test_Conv1d_stride, test/test_nn.py::TestNN::test_Conv1d_stride_cuda, test/test_nn.py::TestNN::test_Conv1d_zero_batch, test/test_nn.py::TestNN::test_Conv1d_zero_batch_cuda, test/test_nn.py::TestNN::test_Conv1d_zeros_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_zeros_stride2_pad2_cuda, test/test_nn.py::TestNN::test_Conv2d, test/test_nn.py::TestNN::test_Conv2d_circular_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_circular_stride2_pad2_cuda, test/test_nn.py::TestNN::test_Conv2d_cuda, test/test_nn.py::TestNN::test_Conv2d_depthwise, test/test_nn.py::TestNN::test_Conv2d_depthwise_cuda, test/test_nn.py::TestNN::test_Conv2d_depthwise_dilated, test/test_nn.py::TestNN::test_Conv2d_depthwise_dilated_cuda, test/test_nn.py::TestNN::test_Conv2d_depthwise_padded, test/test_nn.py::TestNN::test_Conv2d_depthwise_padded_cuda, test/test_nn.py::TestNN::test_Conv2d_depthwise_strided, test/test_nn.py::TestNN::test_Conv2d_depthwise_strided_cuda, test/test_nn.py::TestNN::test_Conv2d_depthwise_with_multiplier, test/test_nn.py::TestNN::test_Conv2d_depthwise_with_multiplier_cuda, test/test_nn.py::TestNN::test_Conv2d_dilated, test/test_nn.py::TestNN::test_Conv2d_dilated_cuda, test/test_nn.py::TestNN::test_Conv2d_dilated_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_dilated_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv2d_groups, test/test_nn.py::TestNN::test_Conv2d_groups_cuda, test/test_nn.py::TestNN::test_Conv2d_groups_thnn, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_cuda, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv2d_groups_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_groups_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv2d_no_bias, test/test_nn.py::TestNN::test_Conv2d_no_bias_cuda, test/test_nn.py::TestNN::test_Conv2d_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_no_bias_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv2d_pad_same, test/test_nn.py::TestNN::test_Conv2d_pad_same_cuda, test/test_nn.py::TestNN::test_Conv2d_pad_same_dilated, test/test_nn.py::TestNN::test_Conv2d_pad_same_dilated_cuda, test/test_nn.py::TestNN::test_Conv2d_pad_valid, test/test_nn.py::TestNN::test_Conv2d_pad_valid_cuda, test/test_nn.py::TestNN::test_Conv2d_padding, test/test_nn.py::TestNN::test_Conv2d_padding_cuda, test/test_nn.py::TestNN::test_Conv2d_padding_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_padding_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv2d_reflect_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_reflect_stride2_pad2_cuda, test/test_nn.py::TestNN::test_Conv2d_replicate_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_replicate_stride2_pad2_cuda, test/test_nn.py::TestNN::test_Conv2d_strided, test/test_nn.py::TestNN::test_Conv2d_strided_cuda, test/test_nn.py::TestNN::test_Conv2d_strided_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_strided_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv2d_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv2d_zero_batch, test/test_nn.py::TestNN::test_Conv2d_zero_batch_cuda, test/test_nn.py::TestNN::test_Conv2d_zero_batch_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_zero_batch_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv2d_zeros_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_zeros_stride2_pad2_cuda, test/test_nn.py::TestNN::test_Conv3d, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_cuda, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv3d_circular_stride2_pad2, test/test_nn.py::TestNN::test_Conv3d_circular_stride2_pad2_cuda, test/test_nn.py::TestNN::test_Conv3d_cuda, test/test_nn.py::TestNN::test_Conv3d_dilated, test/test_nn.py::TestNN::test_Conv3d_dilated_cuda, test/test_nn.py::TestNN::test_Conv3d_dilated_strided, test/test_nn.py::TestNN::test_Conv3d_dilated_strided_cuda, test/test_nn.py::TestNN::test_Conv3d_groups, test/test_nn.py::TestNN::test_Conv3d_groups_cuda, test/test_nn.py::TestNN::test_Conv3d_groups_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_groups_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv3d_no_bias, test/test_nn.py::TestNN::test_Conv3d_no_bias_cuda, test/test_nn.py::TestNN::test_Conv3d_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_no_bias_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv3d_pad_same, test/test_nn.py::TestNN::test_Conv3d_pad_same_cuda, test/test_nn.py::TestNN::test_Conv3d_pad_same_dilated, test/test_nn.py::TestNN::test_Conv3d_pad_same_dilated_cuda, test/test_nn.py::TestNN::test_Conv3d_pad_valid, test/test_nn.py::TestNN::test_Conv3d_pad_valid_cuda, test/test_nn.py::TestNN::test_Conv3d_replicate_stride2_pad2, test/test_nn.py::TestNN::test_Conv3d_replicate_stride2_pad2_cuda, test/test_nn.py::TestNN::test_Conv3d_stride, test/test_nn.py::TestNN::test_Conv3d_stride_cuda, test/test_nn.py::TestNN::test_Conv3d_stride_padding, test/test_nn.py::TestNN::test_Conv3d_stride_padding_cuda, test/test_nn.py::TestNN::test_Conv3d_stride_padding_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_stride_padding_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv3d_stride_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_stride_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv3d_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv3d_zero_batch, test/test_nn.py::TestNN::test_Conv3d_zero_batch_cuda, test/test_nn.py::TestNN::test_Conv3d_zero_batch_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_zero_batch_with_long_tensor_cuda, test/test_nn.py::TestNN::test_Conv3d_zeros_stride2_pad2, test/test_nn.py::TestNN::test_Conv3d_zeros_stride2_pad2_cuda, test/test_nn.py::TestNN::test_ConvTranspose1d, test/test_nn.py::TestNN::test_ConvTranspose1d_cuda, test/test_nn.py::TestNN::test_ConvTranspose1d_dilated, test/test_nn.py::TestNN::test_ConvTranspose1d_dilated_cuda, test/test_nn.py::TestNN::test_ConvTranspose1d_groups, test/test_nn.py::TestNN::test_ConvTranspose1d_groups_cuda, test/test_nn.py::TestNN::test_ConvTranspose1d_no_bias, test/test_nn.py::TestNN::test_ConvTranspose1d_no_bias_cuda, test/test_nn.py::TestNN::test_ConvTranspose2d, test/test_nn.py::TestNN::test_ConvTranspose2d_cuda, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_cuda, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_with_long_tensor_cuda, test/test_nn.py::TestNN::test_ConvTranspose2d_groups, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_with_long_tensor_cuda, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_cuda, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_with_long_tensor_cuda, test/test_nn.py::TestNN::test_ConvTranspose2d_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_with_long_tensor_cuda, test/test_nn.py::TestNN::test_ConvTranspose3d, test/test_nn.py::TestNN::test_ConvTranspose3d_cuda, test/test_nn.py::TestNN::test_ConvTranspose3d_dilated, test/test_nn.py::TestNN::test_ConvTranspose3d_dilated_cuda, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_CrossMapLRN2d, test/test_nn.py::TestNN::test_CrossMapLRN2d_cuda, test/test_nn.py::TestNN::test_ELU_no_batch_dim, test/test_nn.py::TestNN::test_ELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Embedding, test/test_nn.py::TestNN::test_EmbeddingBag_discontiguous, test/test_nn.py::TestNN::test_EmbeddingBag_discontiguous_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_max, test/test_nn.py::TestNN::test_EmbeddingBag_max_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_max_padding_idx, test/test_nn.py::TestNN::test_EmbeddingBag_max_padding_idx_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_mean, test/test_nn.py::TestNN::test_EmbeddingBag_mean_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_mean_padding_idx, test/test_nn.py::TestNN::test_EmbeddingBag_mean_padding_idx_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_sparse, test/test_nn.py::TestNN::test_EmbeddingBag_sparse_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_sum, test/test_nn.py::TestNN::test_EmbeddingBag_sum_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_sum_padding_idx, test/test_nn.py::TestNN::test_EmbeddingBag_sum_padding_idx_cuda, test/test_nn.py::TestNN::test_Embedding_cuda, test/test_nn.py::TestNN::test_Embedding_discontiguous, test/test_nn.py::TestNN::test_Embedding_discontiguous_cuda, test/test_nn.py::TestNN::test_Embedding_sparse, test/test_nn.py::TestNN::test_Embedding_sparse_cuda, test/test_nn.py::TestNN::test_Flatten, test/test_nn.py::TestNN::test_Flatten_cuda, test/test_nn.py::TestNN::test_Flatten_no_batch_dim, test/test_nn.py::TestNN::test_Flatten_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Fold, test/test_nn.py::TestNN::test_Fold_cuda, test/test_nn.py::TestNN::test_Fold_int_input, test/test_nn.py::TestNN::test_Fold_int_input_cuda, test/test_nn.py::TestNN::test_Fold_no_batch_dim_input, test/test_nn.py::TestNN::test_Fold_no_batch_dim_input_cuda, test/test_nn.py::TestNN::test_Fold_no_batch_dim_int_input, test/test_nn.py::TestNN::test_Fold_no_batch_dim_int_input_cuda, test/test_nn.py::TestNN::test_GELU_no_batch_dim, test/test_nn.py::TestNN::test_GELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_GLU_no_batch_dim, test/test_nn.py::TestNN::test_GLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardshrink_no_batch_dim, test/test_nn.py::TestNN::test_Hardshrink_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardsigmoid_no_batch_dim, test/test_nn.py::TestNN::test_Hardsigmoid_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardswish_no_batch_dim, test/test_nn.py::TestNN::test_Hardswish_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardtanh_no_batch_dim, test/test_nn.py::TestNN::test_Hardtanh_no_batch_dim_cuda, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_margin_no_reduce, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_margin_no_reduce_cuda, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_reduce, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_HuberLoss_delta, test/test_nn.py::TestNN::test_HuberLoss_delta_cuda, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_KLDivLoss_batch_mean, test/test_nn.py::TestNN::test_KLDivLoss_batch_mean_log_target, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_log_target, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_log_target_cuda, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar_log_target, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar_log_target_cuda, test/test_nn.py::TestNN::test_KLDivLoss_with_log_target_no_reduce, test/test_nn.py::TestNN::test_KLDivLoss_with_log_target_no_reduce_cuda, test/test_nn.py::TestNN::test_KLDivLoss_with_target_no_reduce, test/test_nn.py::TestNN::test_KLDivLoss_with_target_no_reduce_cuda, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_L1Loss_no_reduce, test/test_nn.py::TestNN::test_L1Loss_no_reduce_complex, test/test_nn.py::TestNN::test_L1Loss_no_reduce_complex_cuda, test/test_nn.py::TestNN::test_L1Loss_no_reduce_cuda, test/test_nn.py::TestNN::test_L1Loss_no_reduce_scalar, test/test_nn.py::TestNN::test_L1Loss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_LSTM_cell, test/test_nn.py::TestNN::test_LSTM_cell_forward_hidden_size, test/test_nn.py::TestNN::test_LSTM_cell_forward_input_size, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature_cuda, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature_eval, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature_eval_cuda, test/test_nn.py::TestNN::test_LeakyReLU_no_batch_dim, test/test_nn.py::TestNN::test_LeakyReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Linear, test/test_nn.py::TestNN::test_Linear_cuda, test/test_nn.py::TestNN::test_Linear_no_batch_dim, test/test_nn.py::TestNN::test_Linear_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Linear_no_bias, test/test_nn.py::TestNN::test_Linear_no_bias_cuda, test/test_nn.py::TestNN::test_LogSigmoid_no_batch_dim, test/test_nn.py::TestNN::test_LogSigmoid_no_batch_dim_cuda, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MSELoss_no_reduce, test/test_nn.py::TestNN::test_MSELoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MSELoss_no_reduce_scalar, test/test_nn.py::TestNN::test_MSELoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MaxUnpool1d_net, test/test_nn.py::TestNN::test_MaxUnpool1d_net_cuda, test/test_nn.py::TestNN::test_MaxUnpool1d_net_no_batch_dim, test/test_nn.py::TestNN::test_MaxUnpool1d_net_no_batch_dim_cuda, test/test_nn.py::TestNN::test_MaxUnpool2d_net, test/test_nn.py::TestNN::test_MaxUnpool2d_net_cuda, test/test_nn.py::TestNN::test_MaxUnpool2d_net_no_batch_dim, test/test_nn.py::TestNN::test_MaxUnpool2d_net_no_batch_dim_cuda, test/test_nn.py::TestNN::test_MaxUnpool3d_net, test/test_nn.py::TestNN::test_MaxUnpool3d_net_cuda, test/test_nn.py::TestNN::test_MaxUnpool3d_net_no_batch_dim, test/test_nn.py::TestNN::test_MaxUnpool3d_net_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Mish_no_batch_dim, test/test_nn.py::TestNN::test_Mish_no_batch_dim_cuda, test/test_nn.py::TestNN::test_ModuleDict, test/test_nn.py::TestNN::test_ModuleList, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_0d_no_reduce, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_0d_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_1d_no_reduce, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_1d_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_index_neg, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_index_neg_cuda, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_reduce, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_reduce, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_weights_no_reduce, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_weights_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_1d_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_1d_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_margin_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_margin_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_p_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_p_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_weights_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_weights_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_ignore_index, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_weights, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_weights_cuda, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_ignore_index, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_weights, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_weights_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_NLLLoss_no_reduce, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_ignore_index, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index_neg, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index_neg_cuda, test/test_nn.py::TestNN::test_PReLU_backward_requires_grad_false, test/test_nn.py::TestNN::test_PReLU_no_batch_dim, test/test_nn.py::TestNN::test_PReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_PairwiseDistance, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_lhs, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_lhs_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_rhs, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_rhs_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_no_batch_dim, test/test_nn.py::TestNN::test_PairwiseDistance_no_batch_dim_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_with_non_default_args, test/test_nn.py::TestNN::test_PairwiseDistance_with_non_default_args_cuda, test/test_nn.py::TestNN::test_ParameterDict, test/test_nn.py::TestNN::test_ParameterDict_replication, test/test_nn.py::TestNN::test_ParameterList, test/test_nn.py::TestNN::test_ParameterList_meta, test/test_nn.py::TestNN::test_ParameterList_replication, test/test_nn.py::TestNN::test_PixelShuffle, test/test_nn.py::TestNN::test_PixelShuffle_cuda, test/test_nn.py::TestNN::test_PixelUnshuffle, test/test_nn.py::TestNN::test_PixelUnshuffle_cuda, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_reduce, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_RNN_cell, test/test_nn.py::TestNN::test_RNN_cell_forward_zero_hidden_size, test/test_nn.py::TestNN::test_RNN_cell_no_broadcasting, test/test_nn.py::TestNN::test_RNN_change_dropout, test/test_nn.py::TestNN::test_RNN_cpu_vs_cudnn_no_dropout, test/test_nn.py::TestNN::test_RNN_cpu_vs_cudnn_with_dropout, test/test_nn.py::TestNN::test_RNN_cudnn_weight_norm, test/test_nn.py::TestNN::test_RNN_dropout, test/test_nn.py::TestNN::test_RNN_dropout_state, test/test_nn.py::TestNN::test_RNN_input_size_zero, test/test_nn.py::TestNN::test_RNN_nonlinearity, test/test_nn.py::TestNN::test_RNN_nonlinearity_passed_as_arg, test/test_nn.py::TestNN::test_RReLU, test/test_nn.py::TestNN::test_RReLU_cuda, test/test_nn.py::TestNN::test_RReLU_no_batch_dim, test/test_nn.py::TestNN::test_RReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_RReLU_with_up_down, test/test_nn.py::TestNN::test_RReLU_with_up_down_cuda, test/test_nn.py::TestNN::test_RReLU_with_up_down_scalar, test/test_nn.py::TestNN::test_RReLU_with_up_down_scalar_cuda, test/test_nn.py::TestNN::test_ReLU6_no_batch_dim, test/test_nn.py::TestNN::test_ReLU6_no_batch_dim_cuda, test/test_nn.py::TestNN::test_ReLU_no_batch_dim, test/test_nn.py::TestNN::test_ReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_ReplicationPad3d, test/test_nn.py::TestNN::test_ReplicationPad3d_complex, test/test_nn.py::TestNN::test_ReplicationPad3d_complex_cuda, test/test_nn.py::TestNN::test_ReplicationPad3d_cuda, test/test_nn.py::TestNN::test_ReplicationPad3d_no_batch_dim, test/test_nn.py::TestNN::test_ReplicationPad3d_no_batch_dim_cuda, test/test_nn.py::TestNN::test_SELU_no_batch_dim, test/test_nn.py::TestNN::test_SELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Sequential_add, test/test_nn.py::TestNN::test_Sequential_append, test/test_nn.py::TestNN::test_Sequential_delitem, test/test_nn.py::TestNN::test_Sequential_extend, test/test_nn.py::TestNN::test_Sequential_getitem, test/test_nn.py::TestNN::test_Sequential_iadd, test/test_nn.py::TestNN::test_Sequential_imul, test/test_nn.py::TestNN::test_Sequential_insert, test/test_nn.py::TestNN::test_Sequential_insert_fail_case, test/test_nn.py::TestNN::test_Sequential_mul, test/test_nn.py::TestNN::test_Sequential_pop, test/test_nn.py::TestNN::test_Sequential_rmul, test/test_nn.py::TestNN::test_Sequential_setitem, test/test_nn.py::TestNN::test_Sequential_setitem_named, test/test_nn.py::TestNN::test_SiLU_no_batch_dim, test/test_nn.py::TestNN::test_SiLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Sigmoid_no_batch_dim, test/test_nn.py::TestNN::test_Sigmoid_no_batch_dim_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_beta, test/test_nn.py::TestNN::test_SmoothL1Loss_beta_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce_scalar, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_zero_beta, test/test_nn.py::TestNN::test_SmoothL1Loss_zero_beta_cuda, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_SoftMarginLoss_no_reduce, test/test_nn.py::TestNN::test_SoftMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_Softplus_no_batch_dim, test/test_nn.py::TestNN::test_Softplus_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Softshrink_no_batch_dim, test/test_nn.py::TestNN::test_Softshrink_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Softsign_no_batch_dim, test/test_nn.py::TestNN::test_Softsign_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Tanh_no_batch_dim, test/test_nn.py::TestNN::test_Tanh_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Tanhshrink_no_batch_dim, test/test_nn.py::TestNN::test_Tanhshrink_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Threshold_no_batch_dim, test/test_nn.py::TestNN::test_Threshold_no_batch_dim_cuda, test/test_nn.py::TestNN::test_TransformerDecoderLayer_gelu_activation, test/test_nn.py::TestNN::test_TransformerDecoderLayer_gelu_activation_cuda, test/test_nn.py::TestNN::test_TransformerDecoderLayer_relu_activation, test/test_nn.py::TestNN::test_TransformerDecoderLayer_relu_activation_cuda, test/test_nn.py::TestNN::test_TransformerEncoderLayer_gelu_activation, test/test_nn.py::TestNN::test_TransformerEncoderLayer_gelu_activation_cuda, test/test_nn.py::TestNN::test_TransformerEncoderLayer_relu_activation, test/test_nn.py::TestNN::test_TransformerEncoderLayer_relu_activation_cuda, test/test_nn.py::TestNN::test_Transformer_cell, test/test_nn.py::TestNN::test_Transformer_multilayer_coder, test/test_nn.py::TestNN::test_Transformer_multilayer_coder_cuda, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean_cuda_float, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none_cuda_float, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum_cuda_float, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_Unflatten_no_batch_dim, test/test_nn.py::TestNN::test_Unflatten_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Unfold, test/test_nn.py::TestNN::test_Unfold_cuda, test/test_nn.py::TestNN::test_Unfold_int_input, test/test_nn.py::TestNN::test_Unfold_int_input_cuda, test/test_nn.py::TestNN::test_adaptive_log_softmax, test/test_nn.py::TestNN::test_add_module, test/test_nn.py::TestNN::test_add_module_raises_error_if_attr_exists, test/test_nn.py::TestNN::test_affine_grid, test/test_nn.py::TestNN::test_affine_grid_3d, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cpu_nd_2, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cpu_nd_3, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cuda_nd_2, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cuda_nd_3, test/test_nn.py::TestNN::test_affine_grid_error_checking, test/test_nn.py::TestNN::test_assignment, test/test_nn.py::TestNN::test_batch_norm_update_stats, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_buffer_update_when_stats_are_not_tracked, test/test_nn.py::TestNN::test_batchnorm_cudnn_half, test/test_nn.py::TestNN::test_batchnorm_cudnn_nhwc, test/test_nn.py::TestNN::test_batchnorm_half_overflow, test/test_nn.py::TestNN::test_batchnorm_load_state_dict, test/test_nn.py::TestNN::test_batchnorm_nhwc_cpu, test/test_nn.py::TestNN::test_batchnorm_nhwc_cuda, test/test_nn.py::TestNN::test_batchnorm_non_contig_cpu_BatchNorm2d, test/test_nn.py::TestNN::test_batchnorm_non_contig_cpu_SyncBatchNorm, test/test_nn.py::TestNN::test_batchnorm_nonaffine_cuda_half_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_bias_is_not_same_size_as_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_less_than_one_value_per_channel, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_running_mean_is_not_same_size_as_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_running_var_is_not_same_size_as_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_running_var_or_running_mean_have_forward_grad, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_weight_is_not_same_size_as_input, test/test_nn.py::TestNN::test_bce_loss_always_nonnegative, test/test_nn.py::TestNN::test_bce_loss_broadcasts_weights, test/test_nn.py::TestNN::test_bce_loss_input_range, test/test_nn.py::TestNN::test_bce_loss_size_mismatch, test/test_nn.py::TestNN::test_bce_with_logits_broadcasts_pos_weights, test/test_nn.py::TestNN::test_bce_with_logits_broadcasts_weights, test/test_nn.py::TestNN::test_bce_with_logits_gives_same_result_as_sigmoid_and_bce_loss, test/test_nn.py::TestNN::test_bce_with_logits_gives_same_result_as_sigmoid_and_bce_loss_large_tensors_with_grad, test/test_nn.py::TestNN::test_bce_with_logits_has_correct_forward_grad, test/test_nn.py::TestNN::test_bce_with_logits_has_correct_grad_at_zero, test/test_nn.py::TestNN::test_bce_with_logits_ones_in_pos_weights_are_the_same_as_none, test/test_nn.py::TestNN::test_bce_with_logits_raises_if_target_and_input_are_different_size, test/test_nn.py::TestNN::test_bce_with_logits_stability, test/test_nn.py::TestNN::test_bce_with_logits_with_pos_weight_has_correct_grad_at_zero, test/test_nn.py::TestNN::test_bilinear, test/test_nn.py::TestNN::test_bilinear_broadcasting, test/test_nn.py::TestNN::test_bilinear_no_bias, test/test_nn.py::TestNN::test_bilinear_non_contiguous, test/test_nn.py::TestNN::test_bilinear_value_error, test/test_nn.py::TestNN::test_broadcast_double_backwards_gpu, test/test_nn.py::TestNN::test_broadcast_no_grad, test/test_nn.py::TestNN::test_broadcast_not_requiring_grad, test/test_nn.py::TestNN::test_buffer_bad_module_subclass, test/test_nn.py::TestNN::test_buffer_not_persistent, test/test_nn.py::TestNN::test_buffer_not_persistent_assign, test/test_nn.py::TestNN::test_buffer_not_persistent_del, test/test_nn.py::TestNN::test_buffer_not_persistent_load, test/test_nn.py::TestNN::test_buffer_not_persistent_overwrite, test/test_nn.py::TestNN::test_buffers_and_named_buffers, test/test_nn.py::TestNN::test_call_supports_python_dict_output, test/test_nn.py::TestNN::test_channel_shuffle_input_checks, test/test_nn.py::TestNN::test_channel_shuffle_return_alias_of_self, test/test_nn.py::TestNN::test_children, test/test_nn.py::TestNN::test_container_copy, test/test_nn.py::TestNN::test_convert_sync_batchnorm, test/test_nn.py::TestNN::test_cosine_embedding_loss_error_on_diff_shapes, test/test_nn.py::TestNN::test_cosine_embedding_loss_error_on_nonexpandable_shapes, test/test_nn.py::TestNN::test_cosine_embedding_loss_invalid_shape, test/test_nn.py::TestNN::test_cosine_embedding_loss_margin_no_reduce, test/test_nn.py::TestNN::test_cosine_embedding_loss_no_reduce, test/test_nn.py::TestNN::test_cosine_embedding_loss_with_diff_type, test/test_nn.py::TestNN::test_cosine_similarity, test/test_nn.py::TestNN::test_cross_entropy_loss, test/test_nn.py::TestNN::test_cross_entropy_loss_precision, test/test_nn.py::TestNN::test_cross_entropy_loss_zero_div, test/test_nn.py::TestNN::test_cudnn_forward_exception, test/test_nn.py::TestNN::test_cudnn_rnn_dropout_states_device, test/test_nn.py::TestNN::test_cudnn_weight_format, test/test_nn.py::TestNN::test_cudnn_weight_tying, test/test_nn.py::TestNN::test_dir, test/test_nn.py::TestNN::test_dir_digit, test/test_nn.py::TestNN::test_elu_inplace_gradgrad, test/test_nn.py::TestNN::test_elu_inplace_on_view, test/test_nn.py::TestNN::test_error_RNN_seq_len_zero, test/test_nn.py::TestNN::test_extra_state, test/test_nn.py::TestNN::test_extra_state_missing_get_extra_state, test/test_nn.py::TestNN::test_extra_state_missing_set_extra_state, test/test_nn.py::TestNN::test_extra_state_non_dict, test/test_nn.py::TestNN::test_fb_fc_packed, test/test_nn.py::TestNN::test_flatten, test/test_nn.py::TestNN::test_fold_invalid_arg, test/test_nn.py::TestNN::test_fractional_max_pool2d_invalid_output_ratio, test/test_nn.py::TestNN::test_gaussian_nll_loss_args, test/test_nn.py::TestNN::test_gaussian_nll_loss_broadcasting, test/test_nn.py::TestNN::test_gaussian_nll_loss_scalar_var, test/test_nn.py::TestNN::test_get_buffer, test/test_nn.py::TestNN::test_get_buffer_from_submodules, test/test_nn.py::TestNN::test_getattr_with_property, test/test_nn.py::TestNN::test_grid_sample, test/test_nn.py::TestNN::test_grid_sample_3d, test/test_nn.py::TestNN::test_grid_sample_error_checking, test/test_nn.py::TestNN::test_grid_sample_nearest_neighbor_rounding_mode_consistency, test/test_nn.py::TestNN::test_hardtanh_backward, test/test_nn.py::TestNN::test_hardtanh_inplace_gradgrad, test/test_nn.py::TestNN::test_huber_loss_invalid_delta, test/test_nn.py::TestNN::test_inplace_thnn, test/test_nn.py::TestNN::test_interpolate, test/test_nn.py::TestNN::test_interpolate_bicubic_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_2d_zero_dim, test/test_nn.py::TestNN::test_interpolate_bicubic_2d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_shared_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_shared_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_2d_zero_dim, test/test_nn.py::TestNN::test_interpolate_bilinear_2d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_shared_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_shared_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d_cuda, test/test_nn.py::TestNN::test_interpolate_buffer_overflow, test/test_nn.py::TestNN::test_interpolate_illegal_memory_access, test/test_nn.py::TestNN::test_interpolate_linear_1d, test/test_nn.py::TestNN::test_interpolate_linear_1d_align_corners, test/test_nn.py::TestNN::test_interpolate_linear_1d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_linear_1d_cuda, test/test_nn.py::TestNN::test_interpolate_linear_1d_zero_dim, test/test_nn.py::TestNN::test_interpolate_linear_1d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d_align_corners, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d_cuda, test/test_nn.py::TestNN::test_interpolate_linear_tuple_1d, test/test_nn.py::TestNN::test_interpolate_linear_tuple_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_1d, test/test_nn.py::TestNN::test_interpolate_nearest_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_1d_zero_dim, test/test_nn.py::TestNN::test_interpolate_nearest_1d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_2d, test/test_nn.py::TestNN::test_interpolate_nearest_2d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_2d_launch_configs, test/test_nn.py::TestNN::test_interpolate_nearest_2d_launch_configs_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_2d_zero_dim, test/test_nn.py::TestNN::test_interpolate_nearest_2d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_3d, test/test_nn.py::TestNN::test_interpolate_nearest_3d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_3d_zero_dim, test/test_nn.py::TestNN::test_interpolate_nearest_3d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_scale_1d, test/test_nn.py::TestNN::test_interpolate_nearest_scale_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_scale_2d, test/test_nn.py::TestNN::test_interpolate_nearest_scale_2d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_scale_3d, test/test_nn.py::TestNN::test_interpolate_nearest_scale_3d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_1d, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_2d, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_2d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_3d, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_3d_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_3d, test/test_nn.py::TestNN::test_interpolate_trilinear_3d_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_3d_zero_dim, test/test_nn.py::TestNN::test_interpolate_trilinear_3d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d_align_corners, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d_align_corners, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d_cuda, test/test_nn.py::TestNN::test_interpolate_undefined_behavior_casting, test/test_nn.py::TestNN::test_kl_div_log_softmax_target, test/test_nn.py::TestNN::test_kl_div_with_diff_type, test/test_nn.py::TestNN::test_kl_div_with_diff_type_log_target, test/test_nn.py::TestNN::test_l1_loss_correct, test/test_nn.py::TestNN::test_layer_norm_backwards_eps, test/test_nn.py::TestNN::test_layer_norm_eps, test/test_nn.py::TestNN::test_layer_norm_grads_with_create_graph_flag, test/test_nn.py::TestNN::test_layer_norm_large_tensor, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightStrided, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightStrided, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightStrided, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightStrided, test/test_nn.py::TestNN::test_linear_broadcasting, test/test_nn.py::TestNN::test_linear_raise_on_scalar_input, test/test_nn.py::TestNN::test_log_softmax_dim0, test/test_nn.py::TestNN::test_log_softmax_dim0_cuda, test/test_nn.py::TestNN::test_log_softmax_dim3, test/test_nn.py::TestNN::test_log_softmax_dim3_cuda, test/test_nn.py::TestNN::test_log_softmax_lastdim, test/test_nn.py::TestNN::test_log_softmax_lastdim_cuda, test/test_nn.py::TestNN::test_log_softmax_scalar, test/test_nn.py::TestNN::test_log_softmax_scalar_cuda, test/test_nn.py::TestNN::test_log_softmax_spatial, test/test_nn.py::TestNN::test_log_softmax_spatial_cuda, test/test_nn.py::TestNN::test_log_softmax_spatial_special, test/test_nn.py::TestNN::test_log_softmax_spatial_special_cuda, test/test_nn.py::TestNN::test_loss_equal_input_target_shape, test/test_nn.py::TestNN::test_margin_ranking_loss_margin_no_reduce, test/test_nn.py::TestNN::test_margin_ranking_loss_no_reduce, test/test_nn.py::TestNN::test_max_pool1d_invalid_output_size, test/test_nn.py::TestNN::test_module_apply_inplace_op, test/test_nn.py::TestNN::test_module_backcompat, test/test_nn.py::TestNN::test_module_super_init, test/test_nn.py::TestNN::test_module_to_argparse, test/test_nn.py::TestNN::test_modules, test/test_nn.py::TestNN::test_mse_loss_size_warning, test/test_nn.py::TestNN::test_multimarginloss_1d_input_0d_target_no_reduce, test/test_nn.py::TestNN::test_multimarginloss_1d_input_0d_target_no_reduce_cuda, test/test_nn.py::TestNN::test_named_children, test/test_nn.py::TestNN::test_named_modules, test/test_nn.py::TestNN::test_named_parameters_remove_duplicate, test/test_nn.py::TestNN::test_native_channel_shuffle_return_alias_of_self, test/test_nn.py::TestNN::test_nested_tensor_from_mask, test/test_nn.py::TestNN::test_nested_tensor_from_mask_error, test/test_nn.py::TestNN::test_no_grad, test/test_nn.py::TestNN::test_non_leaf_parameters, test/test_nn.py::TestNN::test_normalize, test/test_nn.py::TestNN::test_overwrite_module_params_on_conversion, test/test_nn.py::TestNN::test_pack_sequence_batch_sizes_throw, test/test_nn.py::TestNN::test_pad_scalar_error, test/test_nn.py::TestNN::test_padding_list, test/test_nn.py::TestNN::test_pairwise_distance, test/test_nn.py::TestNN::test_parameter_assignment, test/test_nn.py::TestNN::test_parameterlistdict_pickle, test/test_nn.py::TestNN::test_parameterlistdict_setting_attributes, test/test_nn.py::TestNN::test_parameters_and_named_parameters, test/test_nn.py::TestNN::test_parameters_to_vector, test/test_nn.py::TestNN::test_parse_to, test/test_nn.py::TestNN::test_partial_flat_weights, test/test_nn.py::TestNN::test_pdist, test/test_nn.py::TestNN::test_pdist_cpu_gradgrad_unimplemented, test/test_nn.py::TestNN::test_pdist_cuda_gradgrad_unimplemented, test/test_nn.py::TestNN::test_pdist_empty_col, test/test_nn.py::TestNN::test_pdist_empty_row, test/test_nn.py::TestNN::test_pdist_large, test/test_nn.py::TestNN::test_pdist_zeros, test/test_nn.py::TestNN::test_pickle_module_no_weights_only_warning, test/test_nn.py::TestNN::test_pixel_shuffle_nhwc_cpu, test/test_nn.py::TestNN::test_pixel_shuffle_unshuffle, test/test_nn.py::TestNN::test_pointwise_loss_broadcast, test/test_nn.py::TestNN::test_pointwise_loss_target_grad_none_reduction, test/test_nn.py::TestNN::test_projections_errors_on_gru_and_rnn, test/test_nn.py::TestNN::test_projections_lstm_args_check, test/test_nn.py::TestNN::test_projections_lstm_check_device, test/test_nn.py::TestNN::test_projections_lstm_initial_hidden_state, test/test_nn.py::TestNN::test_register_buffer_allows_overwriting_with_same_name, test/test_nn.py::TestNN::test_register_buffer_allows_tensor_like_object, test/test_nn.py::TestNN::test_register_buffer_raises_error_if_attr_exists, test/test_nn.py::TestNN::test_register_buffer_raises_error_if_name_is_not_string, test/test_nn.py::TestNN::test_register_buffer_raises_error_if_not_tensor, test/test_nn.py::TestNN::test_register_parameter_allows_overwriting_with_same_name, test/test_nn.py::TestNN::test_register_parameter_raises_error_if_attr_exists, test/test_nn.py::TestNN::test_register_parameter_raises_error_if_name_is_not_string, test/test_nn.py::TestNN::test_relu_inplace_on_view, test/test_nn.py::TestNN::test_repr, test/test_nn.py::TestNN::test_requires_grad_, test/test_nn.py::TestNN::test_rnn_args_check, test/test_nn.py::TestNN::test_rnn_check_device, test/test_nn.py::TestNN::test_rnn_initial_hidden_state, test/test_nn.py::TestNN::test_rnn_weight_norm, test/test_nn.py::TestNN::test_set_submodule, test/test_nn.py::TestNN::test_share_memory, test/test_nn.py::TestNN::test_smoothl1loss_intergral_target, test/test_nn.py::TestNN::test_smoothl1loss_negative_beta_not_supported, test/test_nn.py::TestNN::test_softmax_functional_dim0, test/test_nn.py::TestNN::test_softmax_functional_dim0_cuda, test/test_nn.py::TestNN::test_softmax_functional_dim3, test/test_nn.py::TestNN::test_softmax_functional_dim3_cuda, test/test_nn.py::TestNN::test_softmax_functional_scalar, test/test_nn.py::TestNN::test_softmax_functional_scalar_cuda, test/test_nn.py::TestNN::test_softmax_lastdim, test/test_nn.py::TestNN::test_softmax_lastdim_cuda, test/test_nn.py::TestNN::test_softmax_lastdim_dtype, test/test_nn.py::TestNN::test_softmax_lastdim_dtype_cuda, test/test_nn.py::TestNN::test_softmax_spatial, test/test_nn.py::TestNN::test_softmax_spatial_cuda, test/test_nn.py::TestNN::test_softmax_spatial_dtype, test/test_nn.py::TestNN::test_softmax_spatial_dtype_cuda, test/test_nn.py::TestNN::test_softmax_spatial_special, test/test_nn.py::TestNN::test_softmax_spatial_special_cuda, test/test_nn.py::TestNN::test_softmin, test/test_nn.py::TestNN::test_spectral_norm, test/test_nn.py::TestNN::test_spectral_norm_dim, test/test_nn.py::TestNN::test_spectral_norm_forward, test/test_nn.py::TestNN::test_spectral_norm_load_state_dict, test/test_nn.py::TestNN::test_spectral_norm_pickle, test/test_nn.py::TestNN::test_state_dict, test/test_nn.py::TestNN::test_swap_module_params_poisons_acc_grad, test/test_nn.py::TestNN::test_sync_batchnorm_accuracy_cuda, test/test_nn.py::TestNN::test_sync_batchnorm_backward_elemt, test/test_nn.py::TestNN::test_threshold_bfloat16_half, test/test_nn.py::TestNN::test_threshold_int, test/test_nn.py::TestNN::test_to, test/test_nn.py::TestNN::test_train_errors_for_invalid_mode, test/test_nn.py::TestNN::test_transformer_args_check, test/test_nn.py::TestNN::test_transformer_layer_args_check, test/test_nn.py::TestNN::test_transformerdecoder, test/test_nn.py::TestNN::test_transformerdecoderlayer, test/test_nn.py::TestNN::test_transformerdecoderlayer_gelu, test/test_nn.py::TestNN::test_triplet_margin_loss, test/test_nn.py::TestNN::test_triplet_margin_loss_no_reduce, test/test_nn.py::TestNN::test_triplet_margin_loss_swap, test/test_nn.py::TestNN::test_triplet_margin_loss_swap_no_reduce, test/test_nn.py::TestNN::test_type, test/test_nn.py::TestNN::test_unflatten, test/test_nn.py::TestNN::test_unflatten_invalid_arg, test/test_nn.py::TestNN::test_unfold_invalid_arg, test/test_nn.py::TestNN::test_upsamplingBilinear2d_spatial_invariance, test/test_nn.py::TestNN::test_upsamplingLinear1d, test/test_nn.py::TestNN::test_upsamplingLinear1d_spatial_invariance, test/test_nn.py::TestNN::test_upsamplingTrilinear3d_spatial_invariance, test/test_nn.py::TestNN::test_upsampling_bfloat16, test/test_nn.py::TestNN::test_upsampling_not_recompute_scale_factor, test/test_nn.py::TestNN::test_upsampling_small_scale, test/test_nn.py::TestNN::test_vector_to_parameters, test/test_nn.py::TestNN::test_weight_norm, test/test_nn.py::TestNN::test_weight_norm_pickle, test/test_nn.py::TestNN::test_weighted_huber_loss, test/test_nn.py::TestNN::test_weighted_l1_loss_with_weights, test/test_nn.py::TestNN::test_weighted_mse_loss, test/test_nn.py::TestNN::test_zero_grad, test/test_nn.py::TestFusionEval::test_fuse_module_eval_numerics, test/test_nn.py::TestConstantPadNd::test_constant_pad_nd, test/test_nn.py::TestConstantPadNd::test_preserves_memory_format, test/test_nn.py::TestAddRelu::test_add_relu, test/test_nn.py::TestAddRelu::test_add_relu_broadcasting, test/test_nn.py::TestFunctionalPickle::test_pickle_softsign, test/test_nn.py::TestFusionUtils::test_fuse_conv_bn_requires_grad, test/test_nn.py::TestFusionUtils::test_fuse_linear_bn_requires_grad, test/test_nn.py::TestUtils::test_consume_prefix_in_state_dict_if_present, test/test_nn.py::TestNNDeviceTypeCUDA::test_BatchNorm_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_Bilinear_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_cudnn_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_empty_target_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_mean_use_module_form_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_mean_use_module_form_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_none_use_module_form_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_none_use_module_form_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_sum_use_module_form_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_sum_use_module_form_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GRU_grad_and_gradgrad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_memory_format_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_numeric_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_raises_error_if_one_value_per_group_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_InstanceNorm1d_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_InstanceNorm2d_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_InstanceNorm3d_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_LSTM_differentiable_backward_using_oneDNN_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_LSTM_differentiable_backward_using_oneDNN_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_LSTM_grad_and_gradgrad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_LayerNorm_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_LayerNorm_numeric_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_LocalResponseNorm_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_empty_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_race_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_race_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_warnings_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad2d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad2d_large_deterministic_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad3d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad_empty_cuda_complex64, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad_empty_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad_fails_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad1d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad2d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad3d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad_empty_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerDecoderLayer_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerDecoder_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerEncoderLayer_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerEncoder_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_Transformer_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_Unfold_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_activations_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_activations_bfloat16_half_cpu_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_activations_bfloat16_half_cpu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_adaptiveavg_pool1d_shmem_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotate0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotate45_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotate90_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotateRandom_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_3d_rotateRandom_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_avg_pool_large_tensor2_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_avg_pool_large_tensor_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_mixed_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_mixed_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_mixed_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_mixed_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_large_batch_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_large_batch_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_mixed_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_mixed_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_update_stats_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_channel_shuffle_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_error_if_nonfinite_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_0_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_1_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_2_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_4_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_inf_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_0_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_1_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_2_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_4_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_inf_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_multi_device_foreach_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_multi_device_foreach_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_value_foreach_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_value_foreach_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_64bit_reduction_mean_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_64bit_reduction_none_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_64bit_reduction_sum_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_consistent_index_target_and_probs_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_errors_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_weight_ignore_indices_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_with_probs_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_mean_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_none_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_sum_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_2d_out_of_bounds_class_index_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_2d_out_of_bounds_class_index_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_index_target_unit_weights_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_one_hot_target_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_all_reductions_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_mean_weighted_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_mean_weighted_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_none_weighted_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_none_weighted_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_sum_weighted_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_sum_weighted_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_unit_weights_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cudnn_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cudnn_tensor_cpu_length_cuda_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cudnn_tensor_cuda_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_error_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_device_mask_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_elu_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_elu_inplace_with_neg_alpha_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_fold_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_glu_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_bfloat16_precision_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_half_precision_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_2d_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_2d_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_3d_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_3d_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_nan_inf_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_nan_inf_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_gumbel_softmax_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_gumbel_softmax_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_gumbel_softmax_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardsigmoid_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_corner_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_corner_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_corner_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_for_single_spatial_element_during_training_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_False_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_False_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_True_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_True_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_False_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_False_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_True_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_True_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_False_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_False_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_True_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_True_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_less_than_one_value_per_channel_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_invalid_reduction_strings_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_layernorm_half_precision_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_layernorm_weight_bias_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_leaky_relu_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_leaky_relu_inplace_with_neg_slope_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_leaky_relu_inplace_with_zero_slope_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_linear_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_big_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_big_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_cpu_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_cpu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_logsigmoid_out_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_lstmcell_backward_only_one_output_grad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_TxT_layout_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_devices_parity_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_forward_with_nans_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_lowp_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_lowp_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_mask_types_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_transformer_layout_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_mish_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_module_to_empty_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_module_to_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_module_to_empty_non_recursive_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_mse_loss_error_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_1d_input_1d_target_invalid_size_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_all_ignored_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_byte_target_matches_long_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_empty_tensor_reduction_mean_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_empty_tensor_reduction_none_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_empty_tensor_reduction_sum_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_invalid_target_dim_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_invalid_weights_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_large_tensor_reduction_mean_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_large_tensor_reduction_none_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_large_tensor_reduction_sum_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_mismatched_batch_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_out_of_bounds_ignore_index_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_total_weight_is_zero_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nn_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nn_scalars_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nn_scalars_reductions_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nonlinearity_propagate_nan_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_one_hot_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_overwrite_module_params_on_conversion_cpu_device_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_pad_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_pad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_prelu_backward_32bit_indexing_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_replicatepad_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_numeric_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_numeric_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_fused_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_fused_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_retain_variables_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_retain_variables_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_retain_variables_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_save_lstm_compatibility_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_silu_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_skip_init_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_smooth_l1_loss_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_smooth_l1_loss_vs_huber_loss_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_smoothl1loss_backward_zero_beta_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_64bit_indexing_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_smem_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_unaligned_grad_output_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_unaligned_output_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_without_fully_vectorized_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cpu_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cpu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_double_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_forward_64bit_indexing_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_results_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_results_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_softplus_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softplus_low_threshold_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softshrink_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softshrink_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softshrink_negative_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_threshold_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_to_complex_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_to_complex_cuda_complex64, test/test_nn.py::TestNNDeviceTypeCUDA::test_to_complex_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_fast_path_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_gelu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_gelu_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_triplet_margin_with_distance_loss_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_triplet_margin_with_distance_loss_default_parity_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_False_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_False_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_True_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_True_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_False_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_False_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_True_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_True_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBicubic2d_aa_correctness_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBicubic2d_aa_correctness_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBicubic2d_correctness_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBilinear2d_aa_correctness_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBilinear2d_aa_correctness_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_correctness_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_correctness_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_launch_config_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_launch_config_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_launch_fail_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_launch_rocm_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format0_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format0_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format1_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format1_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_launch_config_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format0_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format0_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format1_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format1_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact1d_correctness_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact1d_correctness_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact1d_rescale_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_False_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_False_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_True_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_True_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsampling_64bit_indexing_channels_last_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsampling_64bit_indexing_channels_last_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingnearest2d_backward_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_variable_sequence_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_variable_sequence_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_variable_sequence_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_warp_softmax_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_warp_softmax_64bit_indexing_cuda_float32 2025-09-07T06:56:11.1403170Z 2025-09-07T06:56:11.1403338Z Running nn/test_pooling 1/1 ... [2025-09-07 06:56:10.957837] 2025-09-07T06:56:11.1403679Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:56:11.1404531Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_pooling.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:56:10.958163] 2025-09-07T06:56:20.0352435Z 2025-09-07T06:56:20.0353679Z nn/test_pooling 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pooling_1.1_09fb84722f0f9fb9_.log 2025-09-07T06:56:20.0401230Z Running 143 items in this shard: test/nn/test_pooling.py::TestAvgPool::test_avg_pool1d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool2d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool3d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d_with_divisor, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d_with_divisor, test/nn/test_pooling.py::TestPoolingNN::test_MaxUnpool2d_output_size, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_nhwc_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_backward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_forward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_non_contiguous, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_lower_precision, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_none, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_overflow, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool2d_nhwc_cpu, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool3d_input_check, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool1d_empty_kernel, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool3d, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool2d_empty_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool3d_backward_after_cat_dim1_device_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_errors_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case10_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case4_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case5_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case6_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case7_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case8_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case9_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool2d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool3d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_max_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pool_odd_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_uint8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool3d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_indices_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_with_indices_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool3d_non_square_backward_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_large_size_int64_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_size_one_feature_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_bfloat16_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_large_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_zero_stride_cuda 2025-09-07T06:56:20.0444024Z 2025-09-07T06:56:20.0444232Z Running test_multiprocessing_spawn 1/1 ... [2025-09-07 06:56:20.035645] 2025-09-07T06:56:20.0444606Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:56:20.0445645Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_multiprocessing_spawn.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:56:20.035958] 2025-09-07T06:56:37.1243896Z 2025-09-07T06:56:37.1245191Z test_multiprocessing_spawn 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_multiprocessing_spawn_1.1_fdddd2e3977a2d29_.log 2025-09-07T06:56:37.1259182Z Running 31 items in this shard: test/test_multiprocessing_spawn.py::SpawnTest::test_exception_all, test/test_multiprocessing_spawn.py::SpawnTest::test_exception_raises, test/test_multiprocessing_spawn.py::SpawnTest::test_exception_single, test/test_multiprocessing_spawn.py::SpawnTest::test_first_argument_index, test/test_multiprocessing_spawn.py::SpawnTest::test_signal_raises, test/test_multiprocessing_spawn.py::SpawnTest::test_success, test/test_multiprocessing_spawn.py::SpawnTest::test_success_first_then_exception, test/test_multiprocessing_spawn.py::SpawnTest::test_success_non_blocking, test/test_multiprocessing_spawn.py::SpawnTest::test_terminate_exit_grace_period0, test/test_multiprocessing_spawn.py::SpawnTest::test_terminate_exit_grace_period_20, test/test_multiprocessing_spawn.py::SpawnTest::test_terminate_signal, test/test_multiprocessing_spawn.py::ForkTest::test_exception_all, test/test_multiprocessing_spawn.py::ForkTest::test_exception_single, test/test_multiprocessing_spawn.py::ForkTest::test_first_argument_index, test/test_multiprocessing_spawn.py::ForkTest::test_success, test/test_multiprocessing_spawn.py::ForkTest::test_success_first_then_exception, test/test_multiprocessing_spawn.py::ForkTest::test_success_non_blocking, test/test_multiprocessing_spawn.py::ForkTest::test_terminate_exit_grace_period0, test/test_multiprocessing_spawn.py::ForkTest::test_terminate_exit_grace_period_20, test/test_multiprocessing_spawn.py::ForkTest::test_terminate_signal, test/test_multiprocessing_spawn.py::ParallelForkServerShouldWorkTest::test_exception_all, test/test_multiprocessing_spawn.py::ParallelForkServerShouldWorkTest::test_exception_single, test/test_multiprocessing_spawn.py::ParallelForkServerShouldWorkTest::test_first_argument_index, test/test_multiprocessing_spawn.py::ParallelForkServerShouldWorkTest::test_success, test/test_multiprocessing_spawn.py::ParallelForkServerShouldWorkTest::test_success_first_then_exception, test/test_multiprocessing_spawn.py::ParallelForkServerShouldWorkTest::test_success_non_blocking, test/test_multiprocessing_spawn.py::ParallelForkServerShouldWorkTest::test_terminate_exit_grace_period0, test/test_multiprocessing_spawn.py::ParallelForkServerShouldWorkTest::test_terminate_exit_grace_period_20, test/test_multiprocessing_spawn.py::ParallelForkServerShouldWorkTest::test_terminate_signal, test/test_multiprocessing_spawn.py::ParallelForkServerPerfTest::test_forkserver_perf, test/test_multiprocessing_spawn.py::ErrorTest::test_errors_pickleable 2025-09-07T06:56:37.1268087Z 2025-09-07T06:56:37.1268275Z Running nn/test_convolution 1/1 ... [2025-09-07 06:56:37.124963] 2025-09-07T06:56:37.1268637Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:56:37.1269495Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_convolution.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:56:37.125285] 2025-09-07T06:56:42.5979747Z 2025-09-07T06:56:42.5981225Z nn/test_convolution 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_convolution_1.1_1a7ec63dfb79ae79_.log 2025-09-07T06:56:42.6256305Z Running 606 items in this shard: test/nn/test_convolution.py::TestConvolutionNN::test_Conv1d_module_same_padding, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_1x1, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_OneDNN, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_backward_twice, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_groups_nobias, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_groups_nobias_v2, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types_on_GPU_with_cudnn, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types_on_GPU_without_cudnn, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_missing_argument, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_module_same_padding, test/nn/test_convolution.py::TestConvolutionNN::test_Conv3d_groups_nobias, test/nn/test_convolution.py::TestConvolutionNN::test_Conv3d_groups_wbias, test/nn/test_convolution.py::TestConvolutionNN::test_Conv3d_module_same_padding, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose2d_half_cublas_gemm, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose2d_output_size, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose2d_output_size_downsample_upsample, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose3d_correct_output_size, test/nn/test_convolution.py::TestConvolutionNN::test_conv1d_issue_120547, test/nn/test_convolution.py::TestConvolutionNN::test_conv2d_discontiguous_weight, test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_issue_120406, test/nn/test_convolution.py::TestConvolutionNN::test_conv_backcompat, test/nn/test_convolution.py::TestConvolutionNN::test_conv_cudnn_memory_layout_dominance, test/nn/test_convolution.py::TestConvolutionNN::test_conv_invalid_groups, test/nn/test_convolution.py::TestConvolutionNN::test_conv_modules_raise_error_on_incorrect_input_size, test/nn/test_convolution.py::TestConvolutionNN::test_conv_padding_mode, test/nn/test_convolution.py::TestConvolutionNN::test_conv_shapecheck, test/nn/test_convolution.py::TestConvolutionNN::test_conv_tbc, test/nn/test_convolution.py::TestConvolutionNN::test_cudnn_non_contiguous, test/nn/test_convolution.py::TestConvolutionNN::test_cudnn_noncontiguous_weight, test/nn/test_convolution.py::TestConvolutionNN::test_cudnn_not_mutate_stride, test/nn/test_convolution.py::TestConvolutionNN::test_functional_grad_conv, test/nn/test_convolution.py::TestConvolutionNN::test_functional_grad_conv2d, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv1d_input, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv1d_weight, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv2d_input, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv2d_weight, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv3d_input, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv3d_weight, test/nn/test_convolution.py::TestConvolutionNN::test_grouped_conv_cudnn_nhwc_support, test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv1d, test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv2d, test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv3d, test/nn/test_convolution.py::TestConvolutionNN::test_mismatch_shape_conv2d, test/nn/test_convolution.py::TestConvolutionNN::test_nnpack_conv, test/nn/test_convolution.py::TestConvolutionNN::test_permute_conv2d_issue_120211, test/nn/test_convolution.py::TestConvolutionNN::test_thnn_conv_strided_padded_dilated, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_backward_depthwise_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_backward_depthwise_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_depthwise_naive_groups_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_depthwise_naive_groups_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_depthwise_naive_groups_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_bfloat16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_bfloat16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_bfloat16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_large_workspace_cuda_bfloat16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_large_workspace_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_large_workspace_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_large_workspace_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_bfloat16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_size_1_kernel_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv3d_depthwise_naive_groups_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv3d_depthwise_naive_groups_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv3d_depthwise_naive_groups_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose2d_large_output_padding_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose2d_large_output_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose2d_size_1_kernel_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose3d_size_1_kernel_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_contig_wrong_stride_cudnn_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_same_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_same_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_valid_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_valid_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_no_grad_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_same_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_same_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_valid_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_valid_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_64bit_indexing_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_backward_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_backward_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_backward_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_backward_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_same_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_same_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_valid_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_valid_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_convTranspose_empty_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_contiguous_for_oneDNN_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_mismatch_memory_format_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_ndhwc_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_ndhwc_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_support_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_support_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_groups_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_no_bias_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_stride_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_strided_with_3D_input_and_weight_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_empty_channel_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_empty_channel_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_ic1_channels_last_for_oneDNN_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_large_batch_1_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_large_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_large_nosplit_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_noncontig_weights_and_bias_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_noncontig_weights_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_thnn_nhwc_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_thnn_nhwc_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_transpose_with_output_size_and_no_batch_dim_ConvTranspose2d_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_transpose_with_output_size_and_no_batch_dim_ConvTranspose3d_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_transposed_large_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_convert_conv2d_weight_memory_format_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_convert_conv3d_weight_memory_format_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_add_relu_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_add_relu_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_relu_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_relu_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_depthwise_conv_64bit_indexing_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_group_convTranspose_empty_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_group_conv_empty_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_noncontig_conv_grad_cuda_bfloat16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_noncontig_conv_grad_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_noncontig_conv_grad_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_noncontig_conv_grad_cuda_float64 2025-09-07T06:56:42.6521822Z 2025-09-07T06:56:42.6521983Z Running test_overrides 1/1 ... [2025-09-07 06:56:42.599429] 2025-09-07T06:56:42.6522321Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:56:42.6523162Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_overrides.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:56:42.599730] 2025-09-07T06:56:49.1739562Z 2025-09-07T06:56:49.1740788Z test_overrides 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_overrides_1.1_99d8ba705a6d594a_.log 2025-09-07T06:56:49.2120328Z Running 1478 items in this shard: test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_H___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_T___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__backward_hooks___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__base___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__cdata___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__grad_fn___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__post_accumulate_grad_hooks___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__version___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_data___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_device___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_dtype___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_grad_fn___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_imag___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_cpu___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_cuda___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_ipu___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_leaf___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_maia___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_meta___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_mkldnn___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_mps___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_mtia___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_nested___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_quantized___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_sparse___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_sparse_csr___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_vulkan___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_xla___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_xpu___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_itemsize___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_layout___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_mH___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_mT___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_name___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_names___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_nbytes___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_ndim___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_output_nr___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_real___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_requires_grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_retains_grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_shape___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_volatile___get__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___add__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___and__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___array__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___array_wrap__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___bool__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___complex__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___contains__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___cuda_array_interface_____get__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___deepcopy__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___div__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___dlpack__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___dlpack_device__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___eq__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___float__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___floordiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___format__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ge__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___getitem__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___gt__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___iadd__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___iand__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___idiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ifloordiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ilshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___imod__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___imul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___index__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___int__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___invert__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ior__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___irshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___isub__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ixor__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___le__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___len__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___long__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___lshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___lt__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___matmul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___mod__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___mul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ne__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___nonzero__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___or__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___radd__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rand__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rdiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___reduce_ex__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___repr__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___reversed__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rfloordiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rlshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rmatmul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rmod__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rmul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ror__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rpow__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rrshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rsub__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rxor__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___setitem__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___setstate__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___sub__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___truediv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___xor__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__autocast_to_full_precision, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__autocast_to_reduced_precision, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__clear_non_serializable_cached_data, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__coalesced_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__dimI, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__dimV, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__is_view, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nested_tensor_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nested_tensor_storage_offsets, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nested_tensor_strides, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nnz, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__sparse_mask_projection, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__to_dense, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__update_names, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__values, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_abs, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_abs_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_absolute, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_absolute_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acos, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acos_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acosh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acosh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_add, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_add_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addbmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addbmm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcdiv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcdiv_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcmul, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcmul_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmv_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addr_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_adjoint, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_align_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_align_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_all, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_allclose, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_amax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_amin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_aminmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_angle, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_any, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_apply_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccos, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccos_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccosh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccosh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsin_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsinh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsinh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctanh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctanh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argmin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argsort, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argwhere, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_as_strided, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_as_strided_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_as_strided_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asin_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asinh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asinh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atanh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atanh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_backward, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_baddbmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_baddbmm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bernoulli, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bernoulli_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bfloat16, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bincount, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_and, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_and_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_left_shift, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_left_shift_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_not, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_not_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_or, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_or_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_right_shift, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_right_shift_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_xor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_xor_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bool, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_broadcast_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_byte, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cauchy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ccol_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cdouble, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ceil, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ceil_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cfloat, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_chalf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_char, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cholesky, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cholesky_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cholesky_solve, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_max, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_max_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_min, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_min_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clip, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clip_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clone, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_coalesce, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_col_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_conj, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_conj_physical, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_conj_physical_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_contiguous, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_copy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_copysign, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_copysign_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_corrcoef, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cos, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cos_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cosh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cosh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_count_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cov, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cpu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cross, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_crow_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cuda, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cummax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cummin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumprod, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumprod_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumsum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumsum_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_data_ptr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_deg2rad, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_deg2rad_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dense_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dequantize, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_det, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_detach, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_detach_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diag, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diag_embed, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diagflat, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diagonal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diagonal_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diff, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_digamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_digamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dim_order, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dist, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_div, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_div_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_divide, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_divide_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dot, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_double, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dsplit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_element_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_eq, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_eq_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erf_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfc_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfinv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfinv_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expand, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expand_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expm1, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expm1_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exponential_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fill_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fill_diagonal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fix, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fix_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_flatten, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_flip, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fliplr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_flipud, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_float, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_float_power, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_float_power_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor_divide, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor_divide_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmod, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmod_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_frac, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_frac_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_frexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gather, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gcd, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gcd_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ge, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ge_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_geometric_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_geqrf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ger, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_get_device, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater_equal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_half, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hardshrink, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_has_names, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hash_tensor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_heaviside, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_heaviside_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_histc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_histogram, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hsplit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hypot, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hypot_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_i0, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_i0_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igammac, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igammac_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_add, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_add_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_copy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_copy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_fill, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_fill_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_put, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_put_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_reduce_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_select, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_inner, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_int, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_int_repr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ipu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_coalesced, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_complex, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_conj, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_contiguous, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_distributed, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_floating_point, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_inference, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_neg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_pinned, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_same_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_set_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_shared, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_signed, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isclose, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isfinite, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isinf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isnan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isneginf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isposinf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isreal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_istft, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_item, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_kron, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_kthvalue, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lcm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lcm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ldexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ldexp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_le, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_le_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lerp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lerp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less_equal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lgamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lgamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log10, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log10_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log1p, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log1p_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log_normal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logaddexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logaddexp2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logcumsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logdet, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_and, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_and_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_not, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_not_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_or, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_or_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_xor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_xor_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logit_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_long, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lu_solve, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_map2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_map_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_fill, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_fill_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_scatter_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_select, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_matrix_exp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_matrix_power, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_max, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_maximum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mean, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_median, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_min, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_minimum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mode, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_module_load, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_moveaxis, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_movedim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_msort, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mtia, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mul, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mul_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_multinomial, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_multiply, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_multiply_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mvlgamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mvlgamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nan_to_num, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nan_to_num_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nanmean, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nanmedian, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nanquantile, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nansum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_narrow, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_narrow_copy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ndimension, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ne, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ne_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_neg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_neg_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_negative, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_negative_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nelement, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nextafter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nextafter_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nonzero_static, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_norm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_normal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_not_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_not_equal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_numel, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_numpy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_orgqr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ormqr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_outer, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_permute, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pin_memory, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pinverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_polygamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_polygamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_positive, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pow, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pow_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_prelu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_prod, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_put, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_put_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_per_channel_axis, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_per_channel_scales, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_per_channel_zero_points, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_scale, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_zero_point, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_qr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_qscheme, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_quantile, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rad2deg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rad2deg_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_random_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ravel, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reciprocal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reciprocal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_record_stream, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_refine_names, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_register_hook, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_register_post_accumulate_grad_hook, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_relu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_relu_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_remainder, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_remainder_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rename, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rename_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_renorm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_renorm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_repeat, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_repeat_interleave, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_requires_grad_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reshape, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reshape_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_as_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_as_sparse_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resolve_conj, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resolve_neg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_retain_grad, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_roll, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rot90, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_round, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_round_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_row_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rsqrt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rsqrt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_add, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_add_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_reduce_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_select, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_select_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_set_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sgn, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sgn_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_share_memory_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_short, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sigmoid, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sigmoid_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sign, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sign_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_signbit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sin_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinc_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_slice_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_slice_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_slogdet, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_smm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sort, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_mask, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_resize_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_resize_and_clear_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_split, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sqrt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sqrt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_square, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_square_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_squeeze, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_squeeze_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sspaddmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_std, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_stft, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_storage, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_storage_offset, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_storage_type, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sub, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sub_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_subtract, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_subtract_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sum_to_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_svd, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapaxes, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapaxes_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapdims, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapdims_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_t, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_t_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_take, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_take_along_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tan_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tanh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tensor_split, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tile, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to_dense, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to_mkldnn, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to_sparse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tolist, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_topk, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_trace, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_transpose, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_transpose_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_triangular_solve, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tril, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tril_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_triu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_triu_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_true_divide, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_true_divide_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_trunc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_trunc_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_type, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_type_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unbind, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unfold, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_uniform_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unique, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unique_consecutive, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsafe_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsafe_split, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsafe_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsqueeze, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsqueeze_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_untyped_storage, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_values, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_var, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_vdot, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_view, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_view_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_vsplit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_where, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_xlogy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_xlogy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_xpu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_zero_, test/test_overrides.py::TestTorchFunctionOverride::test_base, test/test_overrides.py::TestTorchFunctionOverride::test_dtype_override, test/test_overrides.py::TestTorchFunctionOverride::test_grad, test/test_overrides.py::TestTorchFunctionOverride::test_has_torch_function_non_sequence, test/test_overrides.py::TestTorchFunctionOverride::test_mean_semantics, test/test_overrides.py::TestTorchFunctionOverride::test_mm_semantics, test/test_overrides.py::TestTorchFunctionOverride::test_pow_rpow, test/test_overrides.py::TestTorchFunctionOverride::test_precedence_semantics, test/test_overrides.py::TestTorchFunctionOverride::test_tensor_subclass_propagation, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fftshift, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_hfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_hfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_hfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifftshift, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ihfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ihfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ihfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_irfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_irfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_irfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_rfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_rfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_rfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cholesky, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cholesky_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cond, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cross, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_det, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_diagonal, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eig, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eigh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eigvals, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eigvalsh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_householder_product, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_inv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_inv_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_ldl_factor, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_ldl_factor_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_ldl_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lstsq, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu_factor, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu_factor_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_exp, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_power, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_rank, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_multi_dot, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_pinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_qr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_slogdet, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_solve_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_solve_triangular, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_svd, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_svdvals, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_tensorinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_tensorsolve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_vander, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_vecdot, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_vector_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_avg_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_avg_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_gelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_linear, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_log_sigmoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_one_hot, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_scaled_dot_product_attention, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_softplus, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_softshrink, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_airy_ai, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_j0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_j1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_y0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_y1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_t, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_u, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_v, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_w, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_digamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_entr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erf, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erfc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erfcx, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erfinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_exp2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_expit, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_expm1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_gammainc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_gammaincc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_gammaln, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_hermite_polynomial_h, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_hermite_polynomial_he, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i0e, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i1e, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_laguerre_polynomial_l, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_legendre_polynomial_p, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_log1p, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_log_ndtr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_logit, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_logsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_i0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_i1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_k0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_k1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_multigammaln, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_ndtr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_ndtri, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_polygamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_psi, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_round, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_scaled_modified_bessel_k0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_scaled_modified_bessel_k1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_t, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_u, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_v, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_w, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_sinc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_spherical_bessel_j0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_xlog1py, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_xlogy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_zeta, test/test_overrides.py::TestTorchFunctionOverride::test_torch__assert_async, test/test_overrides.py::TestTorchFunctionOverride::test_torch__conj_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__functional_assert_async, test/test_overrides.py::TestTorchFunctionOverride::test_torch__fused_rms_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__fw_primal_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__lobpcg_lobpcg, test/test_overrides.py::TestTorchFunctionOverride::test_torch__lowrank_pca_lowrank, test/test_overrides.py::TestTorchFunctionOverride::test_torch__lowrank_svd_lowrank, test/test_overrides.py::TestTorchFunctionOverride::test_torch__make_dual_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__native_batch_norm_legit, test/test_overrides.py::TestTorchFunctionOverride::test_torch__neg_view_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__reshape_alias_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__rowwise_prune, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sparse_broadcast_to_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_acos, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_asin, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_atan, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_cos, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_cosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_sin, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_sinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_sqrt, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_tan, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__values_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__wrapped_linear_prepack, test/test_overrides.py::TestTorchFunctionOverride::test_torch__wrapped_quantized_linear_prepacked, test/test_overrides.py::TestTorchFunctionOverride::test_torch_abs, test/test_overrides.py::TestTorchFunctionOverride::test_torch_absolute, test/test_overrides.py::TestTorchFunctionOverride::test_torch_acos, test/test_overrides.py::TestTorchFunctionOverride::test_torch_acosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_adaptive_avg_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_adaptive_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_add, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addbmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addcdiv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addcmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addmv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_adjoint, test/test_overrides.py::TestTorchFunctionOverride::test_torch_affine_grid_generator, test/test_overrides.py::TestTorchFunctionOverride::test_torch_alias_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_all, test/test_overrides.py::TestTorchFunctionOverride::test_torch_allclose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_amax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_amin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_aminmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_angle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_any, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arccos, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arccosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arcsin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arcsinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arctan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arctan2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arctanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argmin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argsort, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argwhere, test/test_overrides.py::TestTorchFunctionOverride::test_torch_as_strided_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_as_strided_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_asin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_asinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_atan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_atan2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_atanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_avg_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_baddbmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_backward_elemt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_backward_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_elemt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_gather_stats, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_gather_stats_with_counts, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_stats, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_update_stats, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bernoulli, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bilinear, test/test_overrides.py::TestTorchFunctionOverride::test_torch_binary_cross_entropy_with_logits, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bincount, test/test_overrides.py::TestTorchFunctionOverride::test_torch_binomial, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_and, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_left_shift, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_not, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_or, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_right_shift, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_xor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_broadcast_to, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bucketize, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cat, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ccol_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ceil, test/test_overrides.py::TestTorchFunctionOverride::test_torch_celu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_channel_shuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cholesky, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cholesky_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cholesky_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch_choose_qparams_optimized, test/test_overrides.py::TestTorchFunctionOverride::test_torch_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clamp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clamp_max, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clamp_min, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clip, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clone, test/test_overrides.py::TestTorchFunctionOverride::test_torch_col_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_column_stack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_combinations, test/test_overrides.py::TestTorchFunctionOverride::test_torch_complex, test/test_overrides.py::TestTorchFunctionOverride::test_torch_concat, test/test_overrides.py::TestTorchFunctionOverride::test_torch_concatenate, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conj, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conj_physical, test/test_overrides.py::TestTorchFunctionOverride::test_torch_constant_pad_nd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_tbc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_transpose1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_transpose2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_transpose3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_convolution, test/test_overrides.py::TestTorchFunctionOverride::test_torch_copysign, test/test_overrides.py::TestTorchFunctionOverride::test_torch_corrcoef, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cos, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cosine_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cosine_similarity, test/test_overrides.py::TestTorchFunctionOverride::test_torch_count_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cov, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cross, test/test_overrides.py::TestTorchFunctionOverride::test_torch_crow_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ctc_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cummax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cummin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cumprod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cumsum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cumulative_trapezoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_deg2rad, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dequantize, test/test_overrides.py::TestTorchFunctionOverride::test_torch_det, test/test_overrides.py::TestTorchFunctionOverride::test_torch_detach, test/test_overrides.py::TestTorchFunctionOverride::test_torch_detach_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diag_embed, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagflat, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagonal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagonal_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagonal_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diff, test/test_overrides.py::TestTorchFunctionOverride::test_torch_digamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dist, test/test_overrides.py::TestTorchFunctionOverride::test_torch_div, test/test_overrides.py::TestTorchFunctionOverride::test_torch_divide, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dsmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dsplit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dstack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_embedding, test/test_overrides.py::TestTorchFunctionOverride::test_torch_embedding_bag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_empty_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_eq, test/test_overrides.py::TestTorchFunctionOverride::test_torch_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_erf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_erfc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_erfinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_exp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_exp2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_expand_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_expm1, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fake_quantize_per_channel_affine, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fake_quantize_per_tensor_affine, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_fp16_weight, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_fp16_weight_fp32_activation, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_int8_weight, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_int8_weight_fp32_activation, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_quantize_weight, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_pack_gemm_matrix_fp16, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_pack_quantized_matrix, test/test_overrides.py::TestTorchFunctionOverride::test_torch_feature_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_feature_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fix, test/test_overrides.py::TestTorchFunctionOverride::test_torch_flatten, test/test_overrides.py::TestTorchFunctionOverride::test_torch_flip, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fliplr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_flipud, test/test_overrides.py::TestTorchFunctionOverride::test_torch_float_power, test/test_overrides.py::TestTorchFunctionOverride::test_torch_floor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_floor_divide, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fmin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fmod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_frac, test/test_overrides.py::TestTorchFunctionOverride::test_torch_frexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_frobenius_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_full_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_empty_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_in_float_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_in_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_in_scalar_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_mixed_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_nested_tuple_getitem, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_not_first_in_list, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_precedence_in_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_atleast_1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_atleast_2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_atleast_3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_block_diag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_broadcast_tensors, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_cartesian_prod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_cdist, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_chain_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_einsum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_lu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_meshgrid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_split, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_stft, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_tensordot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_unique, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_unique_consecutive, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_unravel_index, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fused_moving_avg_obs_fake_quant, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gather, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gcd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ge, test/test_overrides.py::TestTorchFunctionOverride::test_torch_geqrf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ger, test/test_overrides.py::TestTorchFunctionOverride::test_torch_get_device, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gradient, test/test_overrides.py::TestTorchFunctionOverride::test_torch_greater, test/test_overrides.py::TestTorchFunctionOverride::test_torch_greater_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_grid_sampler, test/test_overrides.py::TestTorchFunctionOverride::test_torch_grid_sampler_2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_grid_sampler_3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_group_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gru, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gru_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hardshrink, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hash_tensor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_heaviside, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hinge_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_histc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_histogram, test/test_overrides.py::TestTorchFunctionOverride::test_torch_histogramdd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hsmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hsplit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hstack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hypot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_i0, test/test_overrides.py::TestTorchFunctionOverride::test_torch_igamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_igammac, test/test_overrides.py::TestTorchFunctionOverride::test_torch_imag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_add, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_fill, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_put, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_select, test/test_overrides.py::TestTorchFunctionOverride::test_torch_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_inner, test/test_overrides.py::TestTorchFunctionOverride::test_torch_instance_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_int_repr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_complex, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_conj, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_distributed, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_floating_point, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_inference, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_neg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_same_size, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_signed, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isclose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isfinite, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isinf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isnan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isneginf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isposinf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isreal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_istft, test/test_overrides.py::TestTorchFunctionOverride::test_torch_kl_div, test/test_overrides.py::TestTorchFunctionOverride::test_torch_kron, test/test_overrides.py::TestTorchFunctionOverride::test_torch_kthvalue, test/test_overrides.py::TestTorchFunctionOverride::test_torch_layer_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lcm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ldexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_le, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lerp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_less, test/test_overrides.py::TestTorchFunctionOverride::test_torch_less_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lgamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log10, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log1p, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logaddexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logaddexp2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logcumsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logdet, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_and, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_not, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_or, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_xor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lstm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lstm_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lu_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lu_unpack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_margin_ranking_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_masked_fill, test/test_overrides.py::TestTorchFunctionOverride::test_torch_masked_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_masked_select, test/test_overrides.py::TestTorchFunctionOverride::test_torch_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_matrix_exp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_matrix_power, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool1d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_maximum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_median, test/test_overrides.py::TestTorchFunctionOverride::test_torch_min, test/test_overrides.py::TestTorchFunctionOverride::test_torch_minimum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution_add_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution_transpose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_depthwise_convolution, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_rnn, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mode, test/test_overrides.py::TestTorchFunctionOverride::test_torch_moveaxis, test/test_overrides.py::TestTorchFunctionOverride::test_torch_movedim, test/test_overrides.py::TestTorchFunctionOverride::test_torch_msort, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_multinomial, test/test_overrides.py::TestTorchFunctionOverride::test_torch_multiply, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mvlgamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nan_to_num, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nanmean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nanmedian, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nanquantile, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nansum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_narrow, test/test_overrides.py::TestTorchFunctionOverride::test_torch_narrow_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_channel_shuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_group_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_layer_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ne, test/test_overrides.py::TestTorchFunctionOverride::test_torch_neg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_negative, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nextafter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional__threshold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_avg_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_avg_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool1d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool2d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool3d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_affine_grid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_binary_cross_entropy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_binary_cross_entropy_with_logits, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_celu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_cosine_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_cross_entropy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_ctc_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_elu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_embedding, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_embedding_bag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_feature_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool2d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool3d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_gaussian_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_glu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_grid_sample, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_group_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_gumbel_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_hardtanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_hinge_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_huber_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_instance_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_interpolate, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_kl_div, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_l1_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_layer_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_leaky_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_local_response_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_lp_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_lp_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_lp_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_margin_ranking_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool1d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool2d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool3d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_unpool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_unpool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_unpool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_mish, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_mse_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multi_head_attention_forward, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multi_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multilabel_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multilabel_soft_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_normalize, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_pad, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_poisson_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_relu6, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_rms_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_rrelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_selu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_silu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_smooth_l1_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_soft_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_softmin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_softsign, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_tanhshrink, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_triplet_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_triplet_margin_with_distance_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_unfold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_constant_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_kaiming_uniform_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_normal_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_uniform_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nonzero_static, test/test_overrides.py::TestTorchFunctionOverride::test_torch_norm_except_dim, test/test_overrides.py::TestTorchFunctionOverride::test_torch_not_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nuclear_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_numel, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ones_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_orgqr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ormqr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_outer, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pairwise_distance, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pdist, test/test_overrides.py::TestTorchFunctionOverride::test_torch_permute, test/test_overrides.py::TestTorchFunctionOverride::test_torch_permute_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pinverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pixel_shuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pixel_unshuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_poisson, test/test_overrides.py::TestTorchFunctionOverride::test_torch_poisson_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_polar, test/test_overrides.py::TestTorchFunctionOverride::test_torch_polygamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_positive, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pow, test/test_overrides.py::TestTorchFunctionOverride::test_torch_prelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_prod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_put, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_per_channel_axis, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_per_channel_scales, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_per_channel_zero_points, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_scale, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_zero_point, test/test_overrides.py::TestTorchFunctionOverride::test_torch_qr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantile, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantize_per_channel, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantize_per_tensor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantize_per_tensor_dynamic, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_gru_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_lstm_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_rnn_relu_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_rnn_tanh_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rad2deg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rand_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_randint_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_randn_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ravel, test/test_overrides.py::TestTorchFunctionOverride::test_torch_real, test/test_overrides.py::TestTorchFunctionOverride::test_torch_reciprocal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_remainder, test/test_overrides.py::TestTorchFunctionOverride::test_torch_renorm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_repeat_interleave, test/test_overrides.py::TestTorchFunctionOverride::test_torch_reshape, test/test_overrides.py::TestTorchFunctionOverride::test_torch_resolve_conj, test/test_overrides.py::TestTorchFunctionOverride::test_torch_resolve_neg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rms_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_relu_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_tanh_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_roll, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rot90, test/test_overrides.py::TestTorchFunctionOverride::test_torch_round, test/test_overrides.py::TestTorchFunctionOverride::test_torch_row_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_row_stack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rrelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rsqrt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rsub, test/test_overrides.py::TestTorchFunctionOverride::test_torch_saddmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_scatter_add, test/test_overrides.py::TestTorchFunctionOverride::test_torch_scatter_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_searchsorted, test/test_overrides.py::TestTorchFunctionOverride::test_torch_segment_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_select, test/test_overrides.py::TestTorchFunctionOverride::test_torch_select_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_select_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_selu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sgn, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sigmoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sign, test/test_overrides.py::TestTorchFunctionOverride::test_torch_signbit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sinc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slice_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slice_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slice_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slogdet, test/test_overrides.py::TestTorchFunctionOverride::test_torch_smm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sort, test/test_overrides.py::TestTorchFunctionOverride::test_torch_split_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_torch_split_with_sizes_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sqrt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_square, test/test_overrides.py::TestTorchFunctionOverride::test_torch_squeeze, test/test_overrides.py::TestTorchFunctionOverride::test_torch_squeeze_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_stack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_std, test/test_overrides.py::TestTorchFunctionOverride::test_torch_std_mean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sub, test/test_overrides.py::TestTorchFunctionOverride::test_torch_subtract, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_svd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_swapaxes, test/test_overrides.py::TestTorchFunctionOverride::test_torch_swapdims, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_float, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_int, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_ite, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_max, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_min, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_not, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_sum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_t, test/test_overrides.py::TestTorchFunctionOverride::test_torch_t_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_take, test/test_overrides.py::TestTorchFunctionOverride::test_torch_take_along_dim, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tensor_split, test/test_overrides.py::TestTorchFunctionOverride::test_torch_threshold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tile, test/test_overrides.py::TestTorchFunctionOverride::test_torch_topk, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trace, test/test_overrides.py::TestTorchFunctionOverride::test_torch_transpose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_transpose_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trapezoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trapz, test/test_overrides.py::TestTorchFunctionOverride::test_torch_triangular_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tril, test/test_overrides.py::TestTorchFunctionOverride::test_torch_triplet_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_triu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_true_divide, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trunc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unbind, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unbind_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unflatten, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unfold_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsafe_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsafe_split, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsafe_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsqueeze, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsqueeze_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_values_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_var, test/test_overrides.py::TestTorchFunctionOverride::test_torch_var_mean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_vdot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_complex, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_complex_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_real, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_real_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_vsplit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_vstack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_where, test/test_overrides.py::TestTorchFunctionOverride::test_torch_xlogy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_zeros_like, test/test_overrides.py::TestTorchFunctionOverride::test_user_implementation_raises, test/test_overrides.py::TestEinsumOverride::test_wrapper, test/test_overrides.py::TestGradCheckOverride::test_gradcheck, test/test_overrides.py::TestNamedTuple::test_max, test/test_overrides.py::TestGradNewOnesOverride::test_newones, test/test_overrides.py::TestPickle::test_pickle, test/test_overrides.py::TestBroadcastAllOverride::test_broadcast_all, test/test_overrides.py::TestWrapTorchFunction::test_wrap_torch_function, test/test_overrides.py::TestIndexing::test_getitem, test/test_overrides.py::TestIndexing::test_getitem_subclass, test/test_overrides.py::TestIndexing::test_setitem, test/test_overrides.py::TestIndexing::test_setitem_subclass, test/test_overrides.py::TestIndexing::test_setitem_val, test/test_overrides.py::TestIterator::test_iterator, test/test_overrides.py::TestRNN::test_rnn, test/test_overrides.py::TestDisabledTorchFunction::test_parameter_does_not_prevent_dispatch, test/test_overrides.py::TestResolveName::test_resolve_name, test/test_overrides.py::TestTorchFunctionWarning::test_torch_function_standalone_class, test/test_overrides.py::TestTorchFunctionWarning::test_torch_function_tensor_subclass, test/test_overrides.py::TestDisabledUserWarnings::test_no_implicit_user_warning_for_deprecated_functions, test/test_overrides.py::TestTorchFunctionMode::test_all_same_mode, test/test_overrides.py::TestTorchFunctionMode::test_basic, test/test_overrides.py::TestTorchFunctionMode::test_custom_device_type, test/test_overrides.py::TestTorchFunctionMode::test_device_context_semantics, test/test_overrides.py::TestTorchFunctionMode::test_disable_enable_subclass, test/test_overrides.py::TestTorchFunctionMode::test_disable_enable_torch_function_ctx, test/test_overrides.py::TestTorchFunctionMode::test_disable_subclass_mode, test/test_overrides.py::TestTorchFunctionMode::test_disable_subclass_not_mode, test/test_overrides.py::TestTorchFunctionMode::test_distributions_bernoulli, test/test_overrides.py::TestTorchFunctionMode::test_error_using_class_method_on_mode, test/test_overrides.py::TestTorchFunctionMode::test_factory_override, test/test_overrides.py::TestTorchFunctionMode::test_get_cur_mode, test/test_overrides.py::TestTorchFunctionMode::test_get_mode_stack, test/test_overrides.py::TestTorchFunctionMode::test_getitem_call, test/test_overrides.py::TestTorchFunctionMode::test_mode_notimplemented_loop, test/test_overrides.py::TestTorchFunctionMode::test_modes_handle_first, test/test_overrides.py::TestTorchFunctionMode::test_modes_return_notimplemented, test/test_overrides.py::TestTorchFunctionMode::test_nested_modes_with_python_has_torch_function, test/test_overrides.py::TestTorchFunctionMode::test_nested_same_mode, test/test_overrides.py::TestTorchFunctionMode::test_nn_parse_to, test/test_overrides.py::TestTorchFunctionMode::test_reentrant_mode_idiom, test/test_overrides.py::TestTorchFunctionMode::test_restacking_with_ancestor, test/test_overrides.py::TestTorchFunctionMode::test_subclass_hash, test/test_overrides.py::TestTorchFunctionMode::test_torch_function_all_disabled_api, test/test_overrides.py::TestTorchFunctionMode::test_with_mode, test/test_overrides.py::TestTorchFunctionMode::test_with_mode_created_separately, test/test_overrides.py::TestTorchFunctionMode::test_with_nested_modes 2025-09-07T06:56:49.2485383Z 2025-09-07T06:56:49.2485569Z Running test_mobile_optimizer 1/1 ... [2025-09-07 06:56:49.175846] 2025-09-07T06:56:49.2485930Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:56:49.2486811Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mobile_optimizer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:56:49.176156] 2025-09-07T06:56:53.0964711Z 2025-09-07T06:56:53.0965813Z test_mobile_optimizer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mobile_optimizer_1.1_a87228224a31ad70_.log 2025-09-07T06:56:53.0970455Z Running 7 items in this shard: test/test_mobile_optimizer.py::TestOptimizer::test_clone_module_with_class, test/test_mobile_optimizer.py::TestOptimizer::test_generate_mobile_module_lints, test/test_mobile_optimizer.py::TestOptimizer::test_hoist_conv_packed_params, test/test_mobile_optimizer.py::TestOptimizer::test_mobilenet_optimize_for_mobile, test/test_mobile_optimizer.py::TestOptimizer::test_optimize_for_mobile, test/test_mobile_optimizer.py::TestOptimizer::test_preserve_bundled_inputs_methods, test/test_mobile_optimizer.py::TestOptimizer::test_quantized_conv_no_asan_failures 2025-09-07T06:56:53.0973293Z 2025-09-07T06:56:53.0973503Z Running test_spectral_ops 1/1 ... [2025-09-07 06:56:53.096893] 2025-09-07T06:56:53.0974203Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:56:53.0975298Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_spectral_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:56:53.097307] 2025-09-07T06:56:58.7199612Z 2025-09-07T06:56:58.7201485Z test_spectral_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_spectral_ops_1.1_ad93365d1f7f193c_.log 2025-09-07T06:56:58.7294531Z Running 347 items in this shard: test/test_spectral_ops.py::TestFFTCUDA::test_batch_istft_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_complex_istft_real_equiv_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_definition_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_onesided_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_real_equiv_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_roundtrip_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_roundtrip_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_cufft_context_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_cufft_context_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_cufft_plan_cache_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_ifft_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_fftn_equivalence_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_fftn_equivalence_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_invalid_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_numpy_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_numpy_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_fft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_fft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_fftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_hfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_hfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_hfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ifft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ifft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ifftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ihfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ihfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ihfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_irfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_irfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_irfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_rfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_rfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_rfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_fft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_fft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_fftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_hfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_hfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_hfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ifft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ifft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ifftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ihfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ihfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ihfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_irfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_irfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_irfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_rfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_rfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_rfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_ifft_rfft_irfft_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_input_modification_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft_invalid_dtypes_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft_plan_repeatable_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_int8, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_numpy_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_numpy_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_out_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_out_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_frequencies_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_frequencies_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_hfftn_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_ihfftn_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_against_librosa_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_linearity_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_of_sine_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_requires_window_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_istft_round_trip_simple_cases_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_round_trip_various_params_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_round_trip_with_padding_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_throws_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_fftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_hfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_ifftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_irfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_fftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_hfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_ifftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_irfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_stft_align_to_window_only_requires_non_center_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_stft_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_stft_requires_complex_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_stft_requires_window_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_stft_roundtrip_complex_window_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_stft_roundtrip_complex_window_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_stft_window_device_cuda 2025-09-07T06:56:58.7392680Z 2025-09-07T06:56:58.7392946Z Running distributions/test_distributions 1/1 ... [2025-09-07 06:56:58.720569] 2025-09-07T06:56:58.7393371Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:56:58.7394293Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributions/test_distributions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:56:58.720877] 2025-09-07T06:57:03.1921666Z 2025-09-07T06:57:03.1923326Z distributions/test_distributions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributions.test_distributions_1.1_ab9a63f7cdfb3001_.log 2025-09-07T06:57:03.1996055Z Running 230 items in this shard: test/distributions/test_distributions.py::TestDistributions::test_argmax_relaxed_categorical, test/distributions/test_distributions.py::TestDistributions::test_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_bernoulli_3d, test/distributions/test_distributions.py::TestDistributions::test_bernoulli_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_beta_log_prob, test/distributions/test_distributions.py::TestDistributions::test_beta_sample, test/distributions/test_distributions.py::TestDistributions::test_beta_shape, test/distributions/test_distributions.py::TestDistributions::test_beta_underflow, test/distributions/test_distributions.py::TestDistributions::test_beta_underflow_gpu, test/distributions/test_distributions.py::TestDistributions::test_binomial, test/distributions/test_distributions.py::TestDistributions::test_binomial_bfloat16, test/distributions/test_distributions.py::TestDistributions::test_binomial_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_binomial_extreme_vals, test/distributions/test_distributions.py::TestDistributions::test_binomial_half, test/distributions/test_distributions.py::TestDistributions::test_binomial_log_prob_and_entropy, test/distributions/test_distributions.py::TestDistributions::test_binomial_log_prob_vectorized_count, test/distributions/test_distributions.py::TestDistributions::test_binomial_sample, test/distributions/test_distributions.py::TestDistributions::test_binomial_stable, test/distributions/test_distributions.py::TestDistributions::test_binomial_vectorized_count, test/distributions/test_distributions.py::TestDistributions::test_categorical_1d, test/distributions/test_distributions.py::TestDistributions::test_categorical_2d, test/distributions/test_distributions.py::TestDistributions::test_categorical_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_cauchy, test/distributions/test_distributions.py::TestDistributions::test_cdf_icdf_inverse, test/distributions/test_distributions.py::TestDistributions::test_cdf_log_prob, test/distributions/test_distributions.py::TestDistributions::test_chi2_sample, test/distributions/test_distributions.py::TestDistributions::test_chi2_shape, test/distributions/test_distributions.py::TestDistributions::test_continuous_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_continuous_bernoulli_3d, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_log_prob, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_log_prob_zero, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_mode, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_sample, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_shape, test/distributions/test_distributions.py::TestDistributions::test_distribution_expand, test/distributions/test_distributions.py::TestDistributions::test_distribution_subclass_expand, test/distributions/test_distributions.py::TestDistributions::test_enumerate_support_type, test/distributions/test_distributions.py::TestDistributions::test_exponential, test/distributions/test_distributions.py::TestDistributions::test_exponential_sample, test/distributions/test_distributions.py::TestDistributions::test_fishersnedecor, test/distributions/test_distributions.py::TestDistributions::test_fishersnedecor_sample, test/distributions/test_distributions.py::TestDistributions::test_gamma_gpu_sample, test/distributions/test_distributions.py::TestDistributions::test_gamma_gpu_shape, test/distributions/test_distributions.py::TestDistributions::test_gamma_log_prob_at_boundary, test/distributions/test_distributions.py::TestDistributions::test_gamma_sample, test/distributions/test_distributions.py::TestDistributions::test_gamma_shape, test/distributions/test_distributions.py::TestDistributions::test_generalized_pareto, test/distributions/test_distributions.py::TestDistributions::test_generalized_pareto_sample, test/distributions/test_distributions.py::TestDistributions::test_geometric, test/distributions/test_distributions.py::TestDistributions::test_geometric_log_prob_and_entropy, test/distributions/test_distributions.py::TestDistributions::test_geometric_sample, test/distributions/test_distributions.py::TestDistributions::test_gumbel, test/distributions/test_distributions.py::TestDistributions::test_gumbel_sample, test/distributions/test_distributions.py::TestDistributions::test_halfcauchy, test/distributions/test_distributions.py::TestDistributions::test_halfnormal, test/distributions/test_distributions.py::TestDistributions::test_halfnormal_logprob, test/distributions/test_distributions.py::TestDistributions::test_halfnormal_sample, test/distributions/test_distributions.py::TestDistributions::test_has_examples, test/distributions/test_distributions.py::TestDistributions::test_independent_expand, test/distributions/test_distributions.py::TestDistributions::test_independent_shape, test/distributions/test_distributions.py::TestDistributions::test_invalid_parameter_broadcasting, test/distributions/test_distributions.py::TestDistributions::test_inversegamma, test/distributions/test_distributions.py::TestDistributions::test_inversegamma_sample, test/distributions/test_distributions.py::TestDistributions::test_kumaraswamy_mean_variance, test/distributions/test_distributions.py::TestDistributions::test_kumaraswamy_shape, test/distributions/test_distributions.py::TestDistributions::test_laplace, test/distributions/test_distributions.py::TestDistributions::test_laplace_sample, test/distributions/test_distributions.py::TestDistributions::test_lazy_property_grad, test/distributions/test_distributions.py::TestDistributions::test_lkj_cholesky_log_prob, test/distributions/test_distributions.py::TestDistributions::test_logisticnormal, test/distributions/test_distributions.py::TestDistributions::test_logisticnormal_logprob, test/distributions/test_distributions.py::TestDistributions::test_logisticnormal_sample, test/distributions/test_distributions.py::TestDistributions::test_lognormal, test/distributions/test_distributions.py::TestDistributions::test_lognormal_logprob, test/distributions/test_distributions.py::TestDistributions::test_lognormal_sample, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_log_prob, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_moments, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_properties, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_sample, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_shape, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_binomial_log_prob, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_normal_log_prob, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_sample, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_shape, test/distributions/test_distributions.py::TestDistributions::test_mode, test/distributions/test_distributions.py::TestDistributions::test_multinomial_1d, test/distributions/test_distributions.py::TestDistributions::test_multinomial_1d_log_prob_and_entropy, test/distributions/test_distributions.py::TestDistributions::test_multinomial_2d, test/distributions/test_distributions.py::TestDistributions::test_multinomial_sequential_draw, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_log_prob, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_moments, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_properties, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_sample, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_shape, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_stable_with_precision_matrix, test/distributions/test_distributions.py::TestDistributions::test_negative_binomial, test/distributions/test_distributions.py::TestDistributions::test_negative_binomial_log_prob, test/distributions/test_distributions.py::TestDistributions::test_negative_binomial_log_prob_vectorized_count, test/distributions/test_distributions.py::TestDistributions::test_normal, test/distributions/test_distributions.py::TestDistributions::test_normal_sample, test/distributions/test_distributions.py::TestDistributions::test_one_hot_categorical_1d, test/distributions/test_distributions.py::TestDistributions::test_one_hot_categorical_2d, test/distributions/test_distributions.py::TestDistributions::test_one_hot_categorical_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_pareto, test/distributions/test_distributions.py::TestDistributions::test_pareto_sample, test/distributions/test_distributions.py::TestDistributions::test_poisson_forward_ad, test/distributions/test_distributions.py::TestDistributions::test_poisson_gpu_sample, test/distributions/test_distributions.py::TestDistributions::test_poisson_log_prob, test/distributions/test_distributions.py::TestDistributions::test_poisson_sample, test/distributions/test_distributions.py::TestDistributions::test_poisson_shape, test/distributions/test_distributions.py::TestDistributions::test_relaxed_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_relaxed_one_hot_categorical_1d, test/distributions/test_distributions.py::TestDistributions::test_relaxed_one_hot_categorical_2d, test/distributions/test_distributions.py::TestDistributions::test_repr, test/distributions/test_distributions.py::TestDistributions::test_rounded_relaxed_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_rsample_requires_grad, test/distributions/test_distributions.py::TestDistributions::test_sample_detached, test/distributions/test_distributions.py::TestDistributions::test_studentT, test/distributions/test_distributions.py::TestDistributions::test_studentT_log_prob, test/distributions/test_distributions.py::TestDistributions::test_studentT_sample, test/distributions/test_distributions.py::TestDistributions::test_support_attributes, test/distributions/test_distributions.py::TestDistributions::test_torch_binomial_dtype_errors, test/distributions/test_distributions.py::TestDistributions::test_uniform, test/distributions/test_distributions.py::TestDistributions::test_valid_parameter_broadcasting, test/distributions/test_distributions.py::TestDistributions::test_vonmises_logprob, test/distributions/test_distributions.py::TestDistributions::test_vonmises_sample, test/distributions/test_distributions.py::TestDistributions::test_wishart_log_prob, test/distributions/test_distributions.py::TestDistributions::test_wishart_moments, test/distributions/test_distributions.py::TestDistributions::test_wishart_properties, test/distributions/test_distributions.py::TestDistributions::test_wishart_sample, test/distributions/test_distributions.py::TestDistributions::test_wishart_shape, test/distributions/test_distributions.py::TestDistributions::test_wishart_stable_with_precision_matrix, test/distributions/test_distributions.py::TestDistributions::test_zero_excluded_binomial, test/distributions/test_distributions.py::TestRsample::test_beta_wrt_alpha, test/distributions/test_distributions.py::TestRsample::test_beta_wrt_beta, test/distributions/test_distributions.py::TestRsample::test_chi2, test/distributions/test_distributions.py::TestRsample::test_dirichlet_multivariate, test/distributions/test_distributions.py::TestRsample::test_dirichlet_on_diagonal, test/distributions/test_distributions.py::TestRsample::test_dirichlet_tangent_field, test/distributions/test_distributions.py::TestRsample::test_gamma, test/distributions/test_distributions.py::TestDistributionShapes::test_bernoulli_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_bernoulli_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_beta_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_beta_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_binomial_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_binomial_shape_vectorized_n, test/distributions/test_distributions.py::TestDistributionShapes::test_categorical_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_cauchy_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_cauchy_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_chi2_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_chi2_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_continuous_bernoulli_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_continuous_bernoulli_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_dirichlet_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_entropy_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_exponential_shape_scalar_param, test/distributions/test_distributions.py::TestDistributionShapes::test_exponential_shape_tensor_param, test/distributions/test_distributions.py::TestDistributionShapes::test_gamma_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_gamma_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_geometric_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_geometric_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_gumbel_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_halfcauchy_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_halfcauchy_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_kumaraswamy_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_laplace_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_laplace_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_mixture_same_family_mean_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_mixture_same_family_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_multinomial_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_normal_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_normal_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_one_hot_categorical_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_pareto_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_studentT_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_studentT_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_uniform_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_uniform_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_vonmises_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_vonmises_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_weibull_scale_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_wishart_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_wishart_shape_tensor_params, test/distributions/test_distributions.py::TestKL::test_entropy_exponential_family, test/distributions/test_distributions.py::TestKL::test_entropy_monte_carlo, test/distributions/test_distributions.py::TestKL::test_kl_edgecases, test/distributions/test_distributions.py::TestKL::test_kl_exponential_family, test/distributions/test_distributions.py::TestKL::test_kl_infinite, test/distributions/test_distributions.py::TestKL::test_kl_lowrank_multivariate_normal, test/distributions/test_distributions.py::TestKL::test_kl_lowrank_multivariate_normal_batched, test/distributions/test_distributions.py::TestKL::test_kl_monte_carlo, test/distributions/test_distributions.py::TestKL::test_kl_multivariate_normal, test/distributions/test_distributions.py::TestKL::test_kl_multivariate_normal_batched, test/distributions/test_distributions.py::TestKL::test_kl_multivariate_normal_batched_broadcasted, test/distributions/test_distributions.py::TestKL::test_kl_shape, test/distributions/test_distributions.py::TestKL::test_kl_transformed, test/distributions/test_distributions.py::TestConstraints::test_params_constraints, test/distributions/test_distributions.py::TestConstraints::test_support_constraints, test/distributions/test_distributions.py::TestNumericalStability::test_bernoulli_gradient, test/distributions/test_distributions.py::TestNumericalStability::test_bernoulli_with_logits_overflow, test/distributions/test_distributions.py::TestNumericalStability::test_bernoulli_with_logits_underflow, test/distributions/test_distributions.py::TestNumericalStability::test_categorical_log_prob, test/distributions/test_distributions.py::TestNumericalStability::test_categorical_log_prob_with_logits, test/distributions/test_distributions.py::TestNumericalStability::test_continuous_bernoulli_gradient, test/distributions/test_distributions.py::TestNumericalStability::test_continuous_bernoulli_with_logits_overflow, test/distributions/test_distributions.py::TestNumericalStability::test_continuous_bernoulli_with_logits_underflow, test/distributions/test_distributions.py::TestNumericalStability::test_multinomial_log_prob, test/distributions/test_distributions.py::TestNumericalStability::test_multinomial_log_prob_with_logits, test/distributions/test_distributions.py::TestLazyLogitsInitialization::test_lazy_logits_initialization, test/distributions/test_distributions.py::TestLazyLogitsInitialization::test_lazy_probs_initialization, test/distributions/test_distributions.py::TestAgainstScipy::test_cdf, test/distributions/test_distributions.py::TestAgainstScipy::test_icdf, test/distributions/test_distributions.py::TestAgainstScipy::test_mean, test/distributions/test_distributions.py::TestAgainstScipy::test_variance_stddev, test/distributions/test_distributions.py::TestFunctors::test_cat_event_dim, test/distributions/test_distributions.py::TestFunctors::test_cat_transform, test/distributions/test_distributions.py::TestFunctors::test_cat_transform_non_uniform, test/distributions/test_distributions.py::TestFunctors::test_stack_transform, test/distributions/test_distributions.py::TestValidation::test_invalid, test/distributions/test_distributions.py::TestValidation::test_invalid_log_probs_arg, test/distributions/test_distributions.py::TestValidation::test_valid, test/distributions/test_distributions.py::TestValidation::test_warning_unimplemented_constraints, test/distributions/test_distributions.py::TestJit::test_cdf, test/distributions/test_distributions.py::TestJit::test_entropy, test/distributions/test_distributions.py::TestJit::test_enumerate_support, test/distributions/test_distributions.py::TestJit::test_log_prob, test/distributions/test_distributions.py::TestJit::test_mean, test/distributions/test_distributions.py::TestJit::test_rsample, test/distributions/test_distributions.py::TestJit::test_sample, test/distributions/test_distributions.py::TestJit::test_variance 2025-09-07T06:57:03.2064965Z 2025-09-07T06:57:03.2065107Z Running doctests 1/1 ... [2025-09-07 06:57:03.192785] 2025-09-07T06:57:03.2282516Z Start doctest_module('/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch') 2025-09-07T06:57:03.2283216Z Listing tests 2025-09-07T06:57:03.3078591Z msg = Cannot scrape callname=Library.fallback in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py line=375. 2025-09-07T06:57:03.3079817Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:03.3080667Z Registers the function implementation as the fallback for the given key. 2025-09-07T06:57:03.3081262Z 2025-09-07T06:57:03.3081671Z This function only works for a library with global namespace ("_"). 2025-09-07T06:57:03.3082226Z 2025-09-07T06:57:03.3082376Z Args: 2025-09-07T06:57:03.3083076Z fn: function used as fallback for the given dispatch key or :func:`~fallthrough_kernel` 2025-09-07T06:57:03.3084188Z to register a fallthrough. 2025-09-07T06:57:03.3084979Z dispatch_key: dispatch key that the input function should be registered for. By default, it uses 2025-09-07T06:57:03.3085529Z the dispatch key that the library was created with. 2025-09-07T06:57:03.3086058Z with_keyset: flag controlling if the current dispatcher call keyset should be passed as the first argument 2025-09-07T06:57:03.3086700Z to :attr:`fn` when calling. This should be used to create the appropriate keyset for redispatch calls. 2025-09-07T06:57:03.3087039Z 2025-09-07T06:57:03.3087131Z Example:: 2025-09-07T06:57:03.3087253Z 2025-09-07T06:57:03.3087357Z >>> my_lib = Library("_", "IMPL") 2025-09-07T06:57:03.3087656Z >>> def fallback_kernel(op, *args, **kwargs): 2025-09-07T06:57:03.3087964Z >>> # Handle all autocast ops generically 2025-09-07T06:57:03.3088236Z >>> # ... 2025-09-07T06:57:03.3088502Z >>> my_lib.fallback(fallback_kernel, "Autocast") 2025-09-07T06:57:03.3088793Z 2025-09-07T06:57:03.3089402Z Original Error: IndentationError('expected an indented block after function definition on line 2', ('', 5, 1, 'my_lib.fallback(fallback_kernel, "Autocast")\n', 5, 7)) 2025-09-07T06:57:03.3089984Z 2025-09-07T06:57:03.3090094Z my_lib.fallback(fallback_kernel, "Autocast") 2025-09-07T06:57:03.3090361Z ^ 2025-09-07T06:57:03.3222322Z msg = Cannot scrape callname=register_fake in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py line=948. 2025-09-07T06:57:03.3223953Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:03.3224940Z Register a FakeTensor implementation ("fake impl") for this operator. 2025-09-07T06:57:03.3225498Z 2025-09-07T06:57:03.3225778Z Also sometimes known as a "meta kernel", "abstract impl". 2025-09-07T06:57:03.3226188Z 2025-09-07T06:57:03.3226565Z An "FakeTensor implementation" specifies the behavior of this operator on 2025-09-07T06:57:03.3227502Z Tensors that carry no data ("FakeTensor"). Given some input Tensors with 2025-09-07T06:57:03.3228443Z certain properties (sizes/strides/storage_offset/device), it specifies 2025-09-07T06:57:03.3229126Z what the properties of the output Tensors are. 2025-09-07T06:57:03.3229486Z 2025-09-07T06:57:03.3229819Z The FakeTensor implementation has the same signature as the operator. 2025-09-07T06:57:03.3230583Z It is run for both FakeTensors and meta tensors. To write a FakeTensor 2025-09-07T06:57:03.3231326Z implementation, assume that all Tensor inputs to the operator are 2025-09-07T06:57:03.3232086Z regular CPU/CUDA/Meta tensors, but they do not have storage, and 2025-09-07T06:57:03.3232579Z you are trying to return regular CPU/CUDA/Meta tensor(s) as output. 2025-09-07T06:57:03.3233024Z The FakeTensor implementation must consist of only PyTorch operations 2025-09-07T06:57:03.3233459Z (and may not directly access the storage or data of any input or 2025-09-07T06:57:03.3233798Z intermediate Tensors). 2025-09-07T06:57:03.3233959Z 2025-09-07T06:57:03.3234102Z This API may be used as a decorator (see examples). 2025-09-07T06:57:03.3234317Z 2025-09-07T06:57:03.3234435Z For a detailed guide on custom ops, please see 2025-09-07T06:57:03.3234853Z https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html 2025-09-07T06:57:03.3235141Z 2025-09-07T06:57:03.3235215Z Args: 2025-09-07T06:57:03.3235509Z op_name: Operator name (along with the overload) or OpOverload object. 2025-09-07T06:57:03.3235890Z func: Fake tensor implementation. 2025-09-07T06:57:03.3236235Z lib (Optional[Library]): Library to register the fake tensor to. 2025-09-07T06:57:03.3236636Z allow_override: Flag controlling if we want to override an 2025-09-07T06:57:03.3237018Z existing registered fake impl. This is by default off, 2025-09-07T06:57:03.3237483Z and will error you're trying to register a fake impl to 2025-09-07T06:57:03.3237875Z an operator that already has a fake impl. This also only 2025-09-07T06:57:03.3238245Z applies if the custom operator was not created via 2025-09-07T06:57:03.3238625Z torch.library.custom_op, as overriding and existing fake 2025-09-07T06:57:03.3238962Z impl is already allowed. 2025-09-07T06:57:03.3239136Z 2025-09-07T06:57:03.3239216Z Examples: 2025-09-07T06:57:03.3239416Z >>> import torch 2025-09-07T06:57:03.3239641Z >>> import numpy as np 2025-09-07T06:57:03.3239885Z >>> from torch import Tensor 2025-09-07T06:57:03.3240126Z >>> 2025-09-07T06:57:03.3240388Z >>> # Example 1: an operator without data-dependent output shape 2025-09-07T06:57:03.3240806Z >>> @torch.library.custom_op("mylib::custom_linear", mutates_args=()) 2025-09-07T06:57:03.3241248Z >>> def custom_linear(x: Tensor, weight: Tensor, bias: Tensor) -> Tensor: 2025-09-07T06:57:03.3241665Z >>> raise NotImplementedError("Implementation goes here") 2025-09-07T06:57:03.3241970Z >>> 2025-09-07T06:57:03.3242212Z >>> @torch.library.register_fake("mylib::custom_linear") 2025-09-07T06:57:03.3242520Z >>> def _(x, weight, bias): 2025-09-07T06:57:03.3242763Z >>> assert x.dim() == 2 2025-09-07T06:57:03.3243012Z >>> assert weight.dim() == 2 2025-09-07T06:57:03.3243270Z >>> assert bias.dim() == 1 2025-09-07T06:57:03.3243545Z >>> assert x.shape[1] == weight.shape[1] 2025-09-07T06:57:03.3243931Z >>> assert weight.shape[0] == bias.shape[0] 2025-09-07T06:57:03.3244234Z >>> assert x.device == weight.device 2025-09-07T06:57:03.3244490Z >>> 2025-09-07T06:57:03.3244688Z >>> return (x @ weight.t()) + bias 2025-09-07T06:57:03.3244937Z >>> 2025-09-07T06:57:03.3245182Z >>> with torch._subclasses.fake_tensor.FakeTensorMode(): 2025-09-07T06:57:03.3245568Z >>> x = torch.randn(2, 3) 2025-09-07T06:57:03.3245884Z >>> w = torch.randn(3, 3) 2025-09-07T06:57:03.3246132Z >>> b = torch.randn(3) 2025-09-07T06:57:03.3246396Z >>> y = torch.ops.mylib.custom_linear(x, w, b) 2025-09-07T06:57:03.3246669Z >>> 2025-09-07T06:57:03.3246855Z >>> assert y.shape == (2, 3) 2025-09-07T06:57:03.3247083Z >>> 2025-09-07T06:57:03.3247325Z >>> # Example 2: an operator with data-dependent output shape 2025-09-07T06:57:03.3247734Z >>> @torch.library.custom_op("mylib::custom_nonzero", mutates_args=()) 2025-09-07T06:57:03.3248110Z >>> def custom_nonzero(x: Tensor) -> Tensor: 2025-09-07T06:57:03.3248398Z >>> x_np = x.numpy(force=True) 2025-09-07T06:57:03.3248690Z >>> res = np.stack(np.nonzero(x_np), axis=1) 2025-09-07T06:57:03.3249004Z >>> return torch.tensor(res, device=x.device) 2025-09-07T06:57:03.3249287Z >>> 2025-09-07T06:57:03.3249544Z >>> @torch.library.register_fake("mylib::custom_nonzero") 2025-09-07T06:57:03.3249853Z >>> def _(x): 2025-09-07T06:57:03.3250110Z >>> # Number of nonzero-elements is data-dependent. 2025-09-07T06:57:03.3250446Z >>> # Since we cannot peek at the data in an fake impl, 2025-09-07T06:57:03.3250784Z >>> # we use the ctx object to construct a new symint that 2025-09-07T06:57:03.3251116Z >>> # represents the data-dependent size. 2025-09-07T06:57:03.3251412Z >>> ctx = torch.library.get_ctx() 2025-09-07T06:57:03.3251693Z >>> nnz = ctx.new_dynamic_size() 2025-09-07T06:57:03.3251958Z >>> shape = [nnz, x.dim()] 2025-09-07T06:57:03.3252253Z >>> result = x.new_empty(shape, dtype=torch.int64) 2025-09-07T06:57:03.3252546Z >>> return result 2025-09-07T06:57:03.3252755Z >>> 2025-09-07T06:57:03.3253005Z >>> from torch.fx.experimental.proxy_tensor import make_fx 2025-09-07T06:57:03.3253381Z >>> 2025-09-07T06:57:03.3253584Z >>> x = torch.tensor([0, 1, 2, 3, 4, 0]) 2025-09-07T06:57:03.3254078Z >>> trace = make_fx(torch.ops.mylib.custom_nonzero, tracing_mode="symbolic")(x) 2025-09-07T06:57:03.3254489Z >>> trace.print_readable() 2025-09-07T06:57:03.3254729Z >>> 2025-09-07T06:57:03.3255019Z >>> assert torch.allclose(trace(x), torch.ops.mylib.custom_nonzero(x)) 2025-09-07T06:57:03.3255305Z 2025-09-07T06:57:03.3255376Z 2025-09-07T06:57:03.3255897Z Original Error: IndentationError('expected an indented block after function definition on line 37', ('', 38, 1, '_._ = None\n', 38, 2)) 2025-09-07T06:57:03.3256394Z 2025-09-07T06:57:03.3256477Z _._ = None 2025-09-07T06:57:03.3256646Z ^ 2025-09-07T06:57:03.3363839Z msg = Cannot scrape callname=get_kernel in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py line=1482. 2025-09-07T06:57:03.3365000Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:03.3365796Z Returns the computed kernel for a given operator and dispatch key. 2025-09-07T06:57:03.3366238Z 2025-09-07T06:57:03.3366564Z This function retrieves the kernel that would be executed for a given 2025-09-07T06:57:03.3367356Z operator and dispatch key combination. The returned SafeKernelFunction 2025-09-07T06:57:03.3368097Z can be used to call the kernel in a boxed fashion. The intended use 2025-09-07T06:57:03.3368797Z case for this function is to retrieve the original kernel for a given 2025-09-07T06:57:03.3369587Z dispatch key and then register another kernel to the same dispatch key 2025-09-07T06:57:03.3370512Z that calls into the original kernel for certain cases. 2025-09-07T06:57:03.3370898Z 2025-09-07T06:57:03.3371017Z Args: 2025-09-07T06:57:03.3371501Z op: Operator name (along with the overload) or OpOverload object 2025-09-07T06:57:03.3372392Z Can be a string (e.g., "aten::add.Tensor"), an OpOverload, or a CustomOpDef. 2025-09-07T06:57:03.3373525Z dispatch_key (str | torch.DispatchKey): The dispatch key to get the kernel for. 2025-09-07T06:57:03.3374748Z Can be a string (e.g., "CPU", "CUDA") or a DispatchKey enum value. 2025-09-07T06:57:03.3375262Z 2025-09-07T06:57:03.3375412Z Returns: 2025-09-07T06:57:03.3375815Z torch._C._SafeKernelFunction: A safe kernel function that can be used to 2025-09-07T06:57:03.3376216Z call the kernel. 2025-09-07T06:57:03.3376362Z 2025-09-07T06:57:03.3376446Z Raises: 2025-09-07T06:57:03.3376673Z RuntimeError: If the operator does not exist. 2025-09-07T06:57:03.3376882Z 2025-09-07T06:57:03.3376965Z Example: 2025-09-07T06:57:03.3377174Z >>> # Get the CPU kernel for torch.add 2025-09-07T06:57:03.3377522Z >>> kernel = torch.library.get_kernel("aten::add.Tensor", "CPU") 2025-09-07T06:57:03.3377841Z >>> 2025-09-07T06:57:03.3378034Z >>> # You can also use DispatchKey enum 2025-09-07T06:57:03.3378430Z >>> kernel = torch.library.get_kernel("aten::add.Tensor", torch.DispatchKey.CPU) 2025-09-07T06:57:03.3378801Z >>> 2025-09-07T06:57:03.3378997Z >>> # Or use an OpOverload directly 2025-09-07T06:57:03.3379354Z >>> kernel = torch.library.get_kernel(torch.ops.aten.add.Tensor, "CPU") 2025-09-07T06:57:03.3379700Z >>> 2025-09-07T06:57:03.3379969Z >>> # Example: Using get_kernel in a custom op with conditional dispatch 2025-09-07T06:57:03.3380336Z >>> # Get the original kernel for torch.sin 2025-09-07T06:57:03.3380702Z >>> original_sin_kernel = torch.library.get_kernel("aten::sin", "CPU") 2025-09-07T06:57:03.3381033Z >>> 2025-09-07T06:57:03.3381317Z >>> # If input has negative values, use original sin, otherwise return zeros 2025-09-07T06:57:03.3381714Z >>> def conditional_sin_impl(dispatch_keys, x): 2025-09-07T06:57:03.3382011Z >>> if (x < 0).any(): 2025-09-07T06:57:03.3382425Z >>> return original_sin_kernel.call_boxed(dispatch_keys, x) 2025-09-07T06:57:03.3382756Z >>> else: 2025-09-07T06:57:03.3382984Z >>> return torch.zeros_like(x) 2025-09-07T06:57:03.3383241Z >>> 2025-09-07T06:57:03.3383456Z >>> lib = torch.library.Library("aten", "IMPL") 2025-09-07T06:57:03.3383878Z >>> # with_keyset=True so the first argument to the impl is the current DispatchKeySet 2025-09-07T06:57:03.3384361Z >>> which needs to be the first argument to ``kernel.call_boxed`` 2025-09-07T06:57:03.3384762Z >>> lib.impl("sin", conditional_sin_impl, "CPU", with_keyset=True) 2025-09-07T06:57:03.3385077Z >>> 2025-09-07T06:57:03.3385265Z >>> # Test the conditional behavior 2025-09-07T06:57:03.3385533Z >>> x_positive = torch.tensor([1.0, 2.0]) 2025-09-07T06:57:03.3385814Z >>> x_mixed = torch.tensor([-1.0, 2.0]) 2025-09-07T06:57:03.3386086Z >>> torch.sin(x_positive) 2025-09-07T06:57:03.3386325Z tensor([0., 0.]) 2025-09-07T06:57:03.3386551Z >>> torch.sin(x_mixed) 2025-09-07T06:57:03.3386780Z tensor([-0.8415, 0.9093]) 2025-09-07T06:57:03.3387001Z 2025-09-07T06:57:03.3387477Z Original Error: SyntaxError('invalid syntax', ('', 23, 7, 'which needs to be the first argument to ``kernel.call_boxed``\n', 23, 12)) 2025-09-07T06:57:03.3387967Z 2025-09-07T06:57:03.3388120Z which needs to be the first argument to ``kernel.call_boxed`` 2025-09-07T06:57:03.3388435Z ^ 2025-09-07T06:57:04.0044207Z msg = Cannot scrape callname=is_available in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/accelerator/__init__.py line=66. 2025-09-07T06:57:04.0046203Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:04.0047083Z Check if the current accelerator is available at runtime: it was build, all the 2025-09-07T06:57:04.0047901Z required drivers are available and at least one device is visible. 2025-09-07T06:57:04.0048592Z See :ref:`accelerator` for details. 2025-09-07T06:57:04.0048956Z 2025-09-07T06:57:04.0049314Z Returns: 2025-09-07T06:57:04.0050107Z bool: A boolean indicating if there is an available :ref:`accelerator`. 2025-09-07T06:57:04.0050662Z 2025-09-07T06:57:04.0051065Z .. note:: This API delegates to the device-specific version of `is_available`. 2025-09-07T06:57:04.0051938Z On CUDA, when the environment variable ``PYTORCH_NVML_BASED_CUDA_CHECK=1`` is set, 2025-09-07T06:57:04.0052828Z this function will NOT poison fork. Otherwise, it will. For more details, see 2025-09-07T06:57:04.0053543Z :ref:`multiprocessing-poison-fork-note`. 2025-09-07T06:57:04.0054057Z 2025-09-07T06:57:04.0054207Z Example:: 2025-09-07T06:57:04.0054396Z 2025-09-07T06:57:04.0054802Z >>> assert torch.accelerator.is_available() "No available accelerators detected." 2025-09-07T06:57:04.0055218Z 2025-09-07T06:57:04.0055786Z Original Error: SyntaxError('invalid syntax', ('', 1, 41, 'assert torch.accelerator.is_available() "No available accelerators detected."\n', 1, 78)) 2025-09-07T06:57:04.0056340Z 2025-09-07T06:57:04.0056555Z assert torch.accelerator.is_available() "No available accelerators detected." 2025-09-07T06:57:04.0056953Z ^ 2025-09-07T06:57:04.0068280Z msg = Cannot scrape callname=synchronize in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/accelerator/__init__.py line=212. 2025-09-07T06:57:04.0069002Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:04.0069484Z Wait for all kernels in all streams on the given device to complete. 2025-09-07T06:57:04.0069771Z 2025-09-07T06:57:04.0069847Z Args: 2025-09-07T06:57:04.0070219Z device (:class:`torch.device`, str, int, optional): device for which to synchronize. It must match 2025-09-07T06:57:04.0070763Z the current :ref:`accelerator` device type. If not given, 2025-09-07T06:57:04.0071379Z use :func:`torch.accelerator.current_device_index` by default. 2025-09-07T06:57:04.0071641Z 2025-09-07T06:57:04.0071894Z .. note:: This function is a no-op if the current :ref:`accelerator` is not initialized. 2025-09-07T06:57:04.0072236Z 2025-09-07T06:57:04.0072315Z Example:: 2025-09-07T06:57:04.0072440Z 2025-09-07T06:57:04.0072561Z >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA) 2025-09-07T06:57:04.0072989Z >>> assert torch.accelerator.is_available() "No available accelerators detected." 2025-09-07T06:57:04.0073424Z >>> start_event = torch.Event(enable_timing=True) 2025-09-07T06:57:04.0073749Z >>> end_event = torch.Event(enable_timing=True) 2025-09-07T06:57:04.0074052Z >>> start_event.record() 2025-09-07T06:57:04.0074414Z >>> tensor = torch.randn(100, device=torch.accelerator.current_accelerator()) 2025-09-07T06:57:04.0074806Z >>> sum = torch.sum(tensor) 2025-09-07T06:57:04.0075070Z >>> end_event.record() 2025-09-07T06:57:04.0075330Z >>> torch.accelerator.synchronize() 2025-09-07T06:57:04.0075672Z >>> elapsed_time_ms = start_event.elapsed_time(end_event) 2025-09-07T06:57:04.0075975Z 2025-09-07T06:57:04.0076527Z Original Error: SyntaxError('invalid syntax', ('', 2, 41, 'assert torch.accelerator.is_available() "No available accelerators detected."\n', 2, 78)) 2025-09-07T06:57:04.0077067Z 2025-09-07T06:57:04.0077273Z assert torch.accelerator.is_available() "No available accelerators detected." 2025-09-07T06:57:04.0077660Z ^ 2025-09-07T06:57:04.0570443Z msg = Cannot scrape callname=cudart in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py line=434. 2025-09-07T06:57:04.0571617Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:04.0572299Z Retrieves the CUDA runtime API module. 2025-09-07T06:57:04.0572614Z 2025-09-07T06:57:04.0572620Z 2025-09-07T06:57:04.0573139Z This function initializes the CUDA runtime environment if it is not already 2025-09-07T06:57:04.0574248Z initialized and returns the CUDA runtime API module (_cudart). The CUDA 2025-09-07T06:57:04.0575073Z runtime API module provides access to various CUDA runtime functions. 2025-09-07T06:57:04.0575564Z 2025-09-07T06:57:04.0575687Z Args: 2025-09-07T06:57:04.0575925Z ``None`` 2025-09-07T06:57:04.0576041Z 2025-09-07T06:57:04.0576120Z Returns: 2025-09-07T06:57:04.0576352Z module: The CUDA runtime API module (_cudart). 2025-09-07T06:57:04.0576557Z 2025-09-07T06:57:04.0576640Z Raises: 2025-09-07T06:57:04.0576939Z RuntimeError: If CUDA cannot be re-initialized in a forked subprocess. 2025-09-07T06:57:04.0577505Z AssertionError: If PyTorch is not compiled with CUDA support or if libcudart functions are unavailable. 2025-09-07T06:57:04.0577878Z 2025-09-07T06:57:04.0578002Z Example of CUDA operations with profiling: 2025-09-07T06:57:04.0578277Z >>> import torch 2025-09-07T06:57:04.0578532Z >>> from torch.cuda import cudart, check_error 2025-09-07T06:57:04.0578812Z >>> import os 2025-09-07T06:57:04.0579012Z >>> 2025-09-07T06:57:04.0579220Z >>> os.environ["CUDA_PROFILE"] = "1" 2025-09-07T06:57:04.0579473Z >>> 2025-09-07T06:57:04.0579694Z >>> def perform_cuda_operations_with_streams(): 2025-09-07T06:57:04.0579996Z >>> stream = torch.cuda.Stream() 2025-09-07T06:57:04.0580285Z >>> with torch.cuda.stream(stream): 2025-09-07T06:57:04.0580589Z >>> x = torch.randn(100, 100, device='cuda') 2025-09-07T06:57:04.0580892Z >>> y = torch.randn(100, 100, device='cuda') 2025-09-07T06:57:04.0581180Z >>> z = torch.mul(x, y) 2025-09-07T06:57:04.0581428Z >>> return z 2025-09-07T06:57:04.0581633Z >>> 2025-09-07T06:57:04.0581838Z >>> torch.cuda.synchronize() 2025-09-07T06:57:04.0582227Z >>> print("====== Start nsys profiling ======") 2025-09-07T06:57:04.0582553Z >>> check_error(cudart().cudaProfilerStart()) 2025-09-07T06:57:04.0582884Z >>> with torch.autograd.profiler.emit_nvtx(): 2025-09-07T06:57:04.0583241Z >>> result = perform_cuda_operations_with_streams() 2025-09-07T06:57:04.0583566Z >>> print("CUDA operations completed.") 2025-09-07T06:57:04.0583899Z >>> check_error(torch.cuda.cudart().cudaProfilerStop()) 2025-09-07T06:57:04.0584229Z >>> print("====== End nsys profiling ======") 2025-09-07T06:57:04.0584415Z 2025-09-07T06:57:04.0584593Z To run this example and save the profiling information, execute: 2025-09-07T06:57:04.0585167Z >>> $ nvprof --profile-from-start off --csv --print-summary -o trace_name.prof -f -- python cudart_test.py 2025-09-07T06:57:04.0585559Z 2025-09-07T06:57:04.0585771Z This command profiles the CUDA operations in the provided script and saves 2025-09-07T06:57:04.0586250Z the profiling information to a file named `trace_name.prof`. 2025-09-07T06:57:04.0586686Z The `--profile-from-start off` option ensures that profiling starts only 2025-09-07T06:57:04.0587102Z after the `cudaProfilerStart` call in the script. 2025-09-07T06:57:04.0587498Z The `--csv` and `--print-summary` options format the profiling output as a 2025-09-07T06:57:04.0587877Z CSV file and print a summary, respectively. 2025-09-07T06:57:04.0588272Z The `-o` option specifies the output file name, and the `-f` option forces the 2025-09-07T06:57:04.0588692Z overwrite of the output file if it already exists. 2025-09-07T06:57:04.0589071Z 2025-09-07T06:57:04.0589697Z Original Error: SyntaxError('invalid syntax', ('', 1, 1, '$ nvprof --profile-from-start off --csv --print-summary -o trace_name.prof -f -- python cudart_test.py\n', 1, 2)) 2025-09-07T06:57:04.0590309Z 2025-09-07T06:57:04.0590598Z $ nvprof --profile-from-start off --csv --print-summary -o trace_name.prof -f -- python cudart_test.py 2025-09-07T06:57:04.0591055Z ^ 2025-09-07T06:57:10.0431057Z msg = Cannot scrape callname=ActivationSparsifier in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py line=16. 2025-09-07T06:57:10.0432945Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:10.0433463Z 2025-09-07T06:57:10.0433843Z The Activation sparsifier class aims to sparsify/prune activations in a neural 2025-09-07T06:57:10.0434658Z network. The idea is to attach the sparsifier to a layer (or layers) and it 2025-09-07T06:57:10.0435481Z zeroes out the activations based on the mask_fn (or sparsification function) 2025-09-07T06:57:10.0436102Z input by the user. 2025-09-07T06:57:10.0436619Z The mask_fn is applied once all the inputs are aggregated and reduced i.e. 2025-09-07T06:57:10.0437294Z mask = mask_fn(reduce_fn(aggregate_fn(activations))) 2025-09-07T06:57:10.0437670Z 2025-09-07T06:57:10.0437820Z Note:: 2025-09-07T06:57:10.0438445Z The sparsification mask is computed on the input **before it goes through the attached layer**. 2025-09-07T06:57:10.0439055Z 2025-09-07T06:57:10.0439170Z Args: 2025-09-07T06:57:10.0439378Z model (nn.Module): 2025-09-07T06:57:10.0439698Z The model whose layers will be sparsified. The layers that needs to be 2025-09-07T06:57:10.0440174Z sparsified should be added separately using the register_layer() function 2025-09-07T06:57:10.0440563Z aggregate_fn (Optional, Callable): 2025-09-07T06:57:10.0440944Z default aggregate_fn that is used if not specified while registering the layer. 2025-09-07T06:57:10.0441381Z specifies how inputs should be aggregated over time. 2025-09-07T06:57:10.0441823Z The aggregate_fn should usually take 2 torch tensors and return the aggregated tensor. 2025-09-07T06:57:10.0442218Z Example 2025-09-07T06:57:10.0442480Z def add_agg_fn(tensor1, tensor2): return tensor1 + tensor2 2025-09-07T06:57:10.0442938Z reduce_fn (Optional, Callable): 2025-09-07T06:57:10.0443319Z default reduce_fn that is used if not specified while registering the layer. 2025-09-07T06:57:10.0443814Z reduce_fn will be called on the aggregated tensor i.e. the tensor obtained after 2025-09-07T06:57:10.0444202Z calling agg_fn() on all inputs. 2025-09-07T06:57:10.0444461Z Example 2025-09-07T06:57:10.0444751Z def mean_reduce_fn(agg_tensor): return agg_tensor.mean(dim=0) 2025-09-07T06:57:10.0445097Z mask_fn (Optional, Callable): 2025-09-07T06:57:10.0445525Z default mask_fn that is used to create the sparsification mask using the tensor obtained after 2025-09-07T06:57:10.0446073Z calling the reduce_fn(). This is used by default if a custom one is passed in the 2025-09-07T06:57:10.0446451Z register_layer(). 2025-09-07T06:57:10.0446883Z Note that the mask_fn() definition should contain the sparse arguments that is passed in sparse_config 2025-09-07T06:57:10.0447327Z arguments. 2025-09-07T06:57:10.0447556Z features (Optional, list): 2025-09-07T06:57:10.0447835Z default selected features to sparsify. 2025-09-07T06:57:10.0448270Z If this is non-empty, then the mask_fn will be applied for each feature of the input. 2025-09-07T06:57:10.0448655Z For example, 2025-09-07T06:57:10.0449023Z mask = [mask_fn(reduce_fn(aggregated_fn(input[feature])) for feature in features] 2025-09-07T06:57:10.0449414Z feature_dim (Optional, int): 2025-09-07T06:57:10.0449953Z default dimension of input features. Again, features along this dim will be chosen 2025-09-07T06:57:10.0450363Z for sparsification. 2025-09-07T06:57:10.0450636Z sparse_config (Dict): 2025-09-07T06:57:10.0450980Z Default configuration for the mask_fn. This config will be passed 2025-09-07T06:57:10.0451339Z with the mask_fn() 2025-09-07T06:57:10.0451496Z 2025-09-07T06:57:10.0451647Z Example: 2025-09-07T06:57:10.0451898Z >>> # xdoctest: +SKIP 2025-09-07T06:57:10.0452127Z >>> model = SomeModel() 2025-09-07T06:57:10.0452462Z >>> act_sparsifier = ActivationSparsifier(...) # init activation sparsifier 2025-09-07T06:57:10.0452838Z >>> # Initialize aggregate_fn 2025-09-07T06:57:10.0453075Z >>> def agg_fn(x, y): 2025-09-07T06:57:10.0453286Z >>> return x + y 2025-09-07T06:57:10.0453487Z >>> 2025-09-07T06:57:10.0453671Z >>> # Initialize reduce_fn 2025-09-07T06:57:10.0454176Z >>> def reduce_fn(x): 2025-09-07T06:57:10.0454407Z >>> return torch.mean(x, dim=0) 2025-09-07T06:57:10.0454654Z >>> 2025-09-07T06:57:10.0454832Z >>> # Initialize mask_fn 2025-09-07T06:57:10.0455058Z >>> def mask_fn(data): 2025-09-07T06:57:10.0455311Z >>> return torch.eye(data.shape).to(data.device) 2025-09-07T06:57:10.0455584Z >>> 2025-09-07T06:57:10.0455748Z >>> 2025-09-07T06:57:10.0455940Z >>> act_sparsifier.register_layer( 2025-09-07T06:57:10.0456203Z ... model.some_layer, 2025-09-07T06:57:10.0456457Z ... aggregate_fn=agg_fn, 2025-09-07T06:57:10.0456690Z ... reduce_fn=reduce_fn, 2025-09-07T06:57:10.0456919Z ... mask_fn=mask_fn, 2025-09-07T06:57:10.0457133Z ... ) 2025-09-07T06:57:10.0457299Z >>> 2025-09-07T06:57:10.0457480Z >>> # start training process 2025-09-07T06:57:10.0457703Z >>> for _ in [...]: 2025-09-07T06:57:10.0457913Z >>> # epoch starts 2025-09-07T06:57:10.0458180Z >>> # model.forward(), compute_loss() and model.backwards() 2025-09-07T06:57:10.0458484Z >>> # epoch ends 2025-09-07T06:57:10.0458696Z >>> act_sparsifier.step() 2025-09-07T06:57:10.0458936Z >>> # end training process 2025-09-07T06:57:10.0459179Z >>> sparsifier.squash_mask() 2025-09-07T06:57:10.0459332Z 2025-09-07T06:57:10.0459843Z Original Error: IndentationError("expected an indented block after 'for' statement on line 25", ('', 26, 1, '_._ = None\n', 26, 2)) 2025-09-07T06:57:10.0460331Z 2025-09-07T06:57:10.0460409Z _._ = None 2025-09-07T06:57:10.0460575Z ^ 2025-09-07T06:57:10.7340423Z msg = Cannot scrape callname=register_parametrization in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/parametrize.py line=424. 2025-09-07T06:57:10.7341873Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:10.7342830Z Register a parametrization to a tensor in a module. 2025-09-07T06:57:10.7343295Z 2025-09-07T06:57:10.7343754Z Assume that ``tensor_name="weight"`` for simplicity. When accessing ``module.weight``, 2025-09-07T06:57:10.7344877Z the module will return the parametrized version ``parametrization(module.weight)``. 2025-09-07T06:57:10.7346000Z If the original tensor requires a gradient, the backward pass will differentiate 2025-09-07T06:57:10.7347006Z through :attr:`parametrization`, and the optimizer will update the tensor accordingly. 2025-09-07T06:57:10.7347600Z 2025-09-07T06:57:10.7348056Z The first time that a module registers a parametrization, this function will add an attribute 2025-09-07T06:57:10.7349037Z ``parametrizations`` to the module of type :class:`~ParametrizationList`. 2025-09-07T06:57:10.7349566Z 2025-09-07T06:57:10.7349937Z The list of parametrizations on the tensor ``weight`` will be accessible under 2025-09-07T06:57:10.7350659Z ``module.parametrizations.weight``. 2025-09-07T06:57:10.7351002Z 2025-09-07T06:57:10.7351212Z The original tensor will be accessible under 2025-09-07T06:57:10.7352061Z ``module.parametrizations.weight.original``. 2025-09-07T06:57:10.7352452Z 2025-09-07T06:57:10.7352712Z Parametrizations may be concatenated by registering several parametrizations 2025-09-07T06:57:10.7353122Z on the same attribute. 2025-09-07T06:57:10.7353269Z 2025-09-07T06:57:10.7353476Z The training mode of a registered parametrization is updated on registration 2025-09-07T06:57:10.7353882Z to match the training mode of the host module 2025-09-07T06:57:10.7354165Z 2025-09-07T06:57:10.7354503Z Parametrized parameters and buffers have an inbuilt caching system that can be activated 2025-09-07T06:57:10.7354952Z using the context manager :func:`cached`. 2025-09-07T06:57:10.7355141Z 2025-09-07T06:57:10.7355344Z A :attr:`parametrization` may optionally implement a method with signature 2025-09-07T06:57:10.7355637Z 2025-09-07T06:57:10.7355737Z .. code-block:: python 2025-09-07T06:57:10.7355889Z 2025-09-07T06:57:10.7356077Z def right_inverse(self, X: Tensor) -> Union[Tensor, Sequence[Tensor]] 2025-09-07T06:57:10.7356352Z 2025-09-07T06:57:10.7356571Z This method is called on the unparametrized tensor when the first parametrization 2025-09-07T06:57:10.7357051Z is registered to compute the initial value of the original tensor. 2025-09-07T06:57:10.7357558Z If this method is not implemented, the original tensor will be just the unparametrized tensor. 2025-09-07T06:57:10.7357896Z 2025-09-07T06:57:10.7358147Z If all the parametrizations registered on a tensor implement `right_inverse` it is possible 2025-09-07T06:57:10.7358717Z to initialize a parametrized tensor by assigning to it, as shown in the example below. 2025-09-07T06:57:10.7359038Z 2025-09-07T06:57:10.7359221Z It is possible for the first parametrization to depend on several inputs. 2025-09-07T06:57:10.7359684Z This may be implemented returning a tuple of tensors from ``right_inverse`` 2025-09-07T06:57:10.7360159Z (see the example implementation of a ``RankOne`` parametrization below). 2025-09-07T06:57:10.7360441Z 2025-09-07T06:57:10.7360711Z In this case, the unconstrained tensors are also located under ``module.parametrizations.weight`` 2025-09-07T06:57:10.7361186Z with names ``original0``, ``original1``,... 2025-09-07T06:57:10.7361379Z 2025-09-07T06:57:10.7361455Z .. note:: 2025-09-07T06:57:10.7361564Z 2025-09-07T06:57:10.7361911Z If unsafe=False (default) both the forward and right_inverse methods will be called 2025-09-07T06:57:10.7362360Z once to perform a number of consistency checks. 2025-09-07T06:57:10.7362792Z If unsafe=True, then right_inverse will be called if the tensor is not parametrized, 2025-09-07T06:57:10.7363199Z and nothing will be called otherwise. 2025-09-07T06:57:10.7363381Z 2025-09-07T06:57:10.7363459Z .. note:: 2025-09-07T06:57:10.7363560Z 2025-09-07T06:57:10.7363736Z In most situations, ``right_inverse`` will be a function such that 2025-09-07T06:57:10.7364095Z ``forward(right_inverse(X)) == X`` (see 2025-09-07T06:57:10.7364530Z `right inverse `_). 2025-09-07T06:57:10.7365057Z Sometimes, when the parametrization is not surjective, it may be reasonable 2025-09-07T06:57:10.7365436Z to relax this. 2025-09-07T06:57:10.7365571Z 2025-09-07T06:57:10.7365646Z .. warning:: 2025-09-07T06:57:10.7365772Z 2025-09-07T06:57:10.7366001Z If a parametrization depends on several inputs, :func:`~register_parametrization` 2025-09-07T06:57:10.7366542Z will register a number of new parameters. If such parametrization is registered 2025-09-07T06:57:10.7367065Z after the optimizer is created, these new parameters will need to be added manually 2025-09-07T06:57:10.7367536Z to the optimizer. See :meth:`torch.Optimizer.add_param_group`. 2025-09-07T06:57:10.7367786Z 2025-09-07T06:57:10.7367857Z Args: 2025-09-07T06:57:10.7368131Z module (nn.Module): module on which to register the parametrization 2025-09-07T06:57:10.7368649Z tensor_name (str): name of the parameter or buffer on which to register 2025-09-07T06:57:10.7369006Z the parametrization 2025-09-07T06:57:10.7369336Z parametrization (nn.Module): the parametrization to register 2025-09-07T06:57:10.7369667Z Keyword args: 2025-09-07T06:57:10.7369966Z unsafe (bool): a boolean flag that denotes whether the parametrization 2025-09-07T06:57:10.7370464Z may change the dtype and shape of the tensor. Default: `False` 2025-09-07T06:57:10.7370988Z Warning: the parametrization is not checked for consistency upon registration. 2025-09-07T06:57:10.7371397Z Enable this flag at your own risk. 2025-09-07T06:57:10.7371586Z 2025-09-07T06:57:10.7371657Z Raises: 2025-09-07T06:57:10.7371998Z ValueError: if the module does not have a parameter or a buffer named :attr:`tensor_name` 2025-09-07T06:57:10.7372323Z 2025-09-07T06:57:10.7372396Z Examples: 2025-09-07T06:57:10.7372636Z >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_LAPACK) 2025-09-07T06:57:10.7372931Z >>> import torch 2025-09-07T06:57:10.7373161Z >>> import torch.nn as nn 2025-09-07T06:57:10.7373443Z >>> import torch.nn.utils.parametrize as P 2025-09-07T06:57:10.7373722Z >>> 2025-09-07T06:57:10.7374028Z >>> class Symmetric(nn.Module): 2025-09-07T06:57:10.7374297Z >>> def forward(self, X): 2025-09-07T06:57:10.7374614Z >>> return X.triu() + X.triu(1).T # Return a symmetric matrix 2025-09-07T06:57:10.7374918Z >>> 2025-09-07T06:57:10.7375109Z >>> def right_inverse(self, A): 2025-09-07T06:57:10.7375375Z >>> return A.triu() 2025-09-07T06:57:10.7375601Z >>> 2025-09-07T06:57:10.7375779Z >>> m = nn.Linear(5, 5) 2025-09-07T06:57:10.7376082Z >>> P.register_parametrization(m, "weight", Symmetric()) 2025-09-07T06:57:10.7376516Z >>> print(torch.allclose(m.weight, m.weight.T)) # m.weight is now symmetric 2025-09-07T06:57:10.7376876Z True 2025-09-07T06:57:10.7377067Z >>> A = torch.rand(5, 5) 2025-09-07T06:57:10.7377319Z >>> A = A + A.T # A is now symmetric 2025-09-07T06:57:10.7377664Z >>> m.weight = A # Initialize the weight to be the symmetric matrix A 2025-09-07T06:57:10.7378017Z >>> print(torch.allclose(m.weight, A)) 2025-09-07T06:57:10.7378368Z True 2025-09-07T06:57:10.7378492Z 2025-09-07T06:57:10.7378586Z >>> class RankOne(nn.Module): 2025-09-07T06:57:10.7378851Z >>> def forward(self, x, y): 2025-09-07T06:57:10.7379147Z >>> # Form a rank 1 matrix multiplying two vectors 2025-09-07T06:57:10.7379468Z >>> return x.unsqueeze(-1) @ y.unsqueeze(-2) 2025-09-07T06:57:10.7379741Z >>> 2025-09-07T06:57:10.7379932Z >>> def right_inverse(self, Z): 2025-09-07T06:57:10.7380208Z >>> # Project Z onto the rank 1 matrices 2025-09-07T06:57:10.7380540Z >>> U, S, Vh = torch.linalg.svd(Z, full_matrices=False) 2025-09-07T06:57:10.7380866Z >>> # Return rescaled singular vectors 2025-09-07T06:57:10.7381150Z >>> s0_sqrt = S[0].sqrt().unsqueeze(-1) 2025-09-07T06:57:10.7381460Z >>> return U[..., :, 0] * s0_sqrt, Vh[..., 0, :] * s0_sqrt 2025-09-07T06:57:10.7381747Z >>> 2025-09-07T06:57:10.7381974Z >>> linear_rank_one = P.register_parametrization( 2025-09-07T06:57:10.7382293Z ... nn.Linear(4, 4), "weight", RankOne() 2025-09-07T06:57:10.7382560Z ... ) 2025-09-07T06:57:10.7382832Z >>> print(torch.linalg.matrix_rank(linear_rank_one.weight).item()) 2025-09-07T06:57:10.7383156Z 1 2025-09-07T06:57:10.7383254Z 2025-09-07T06:57:10.7383329Z 2025-09-07T06:57:10.7383854Z Original Error: IndentationError('expected an indented block after function definition on line 2', ('', 3, 0, '_._ = None\n', 3, -1)) 2025-09-07T06:57:10.7384355Z 2025-09-07T06:57:10.7384431Z _._ = None 2025-09-07T06:57:10.7384693Z ^ 2025-09-07T06:57:11.6146244Z msg = Cannot scrape callname=DeviceMesh.__getitem__ in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/device_mesh.py line=701. 2025-09-07T06:57:11.6147637Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:11.6148159Z 2025-09-07T06:57:11.6148571Z Slice the current DeviceMesh based on the mesh_dim_names given to create a submesh. 2025-09-07T06:57:11.6149738Z The submesh created consists of the dimensions and the communicators indicated by 2025-09-07T06:57:11.6150555Z ``mesh_dim_names`` 2025-09-07T06:57:11.6150772Z 2025-09-07T06:57:11.6150898Z Args: 2025-09-07T06:57:11.6151422Z mesh_dim_names (Union[str, Tuple[str]]): the name or the tuple of names of the 2025-09-07T06:57:11.6152202Z mesh dimension of the DeviceMesh to create the submesh for. 2025-09-07T06:57:11.6152756Z Returns: 2025-09-07T06:57:11.6153078Z A :class:`DeviceMesh` object 2025-09-07T06:57:11.6153359Z 2025-09-07T06:57:11.6153780Z The following program runs on each process/rank in an SPMD manner in a world size of 8. 2025-09-07T06:57:11.6154488Z In the first example: 2025-09-07T06:57:11.6155073Z Calling mesh_2d["tp"] on rank 0, 1, 2, 3 returns a 1D submesh of DeviceMesh:([0, 1, 2, 3]). 2025-09-07T06:57:11.6155938Z Calling mesh_2d["tp"] on rank 4, 5, 6, 7 returns a 1D submesh of DeviceMesh:([4, 5, 6, 7]). 2025-09-07T06:57:11.6156765Z Calling mesh_2d["dp"] on rank 0, 4 returns a 1D submesh of DeviceMesh:([0, 4]). 2025-09-07T06:57:11.6157536Z Calling mesh_2d["dp"] on rank 1, 5 returns a 1D submesh of DeviceMesh:([1, 5]). 2025-09-07T06:57:11.6158345Z Calling mesh_2d["dp"] on rank 2, 6 returns a 1D submesh of DeviceMesh:([2, 6]). 2025-09-07T06:57:11.6159227Z Calling mesh_2d["dp"] on rank 3, 7 returns a 1D submesh of DeviceMesh:([3, 7]). 2025-09-07T06:57:11.6159763Z 2025-09-07T06:57:11.6159930Z In the second example: 2025-09-07T06:57:11.6160652Z Calling mesh_3d["dp", "cp"] on rank 0, 1, 4, 5 returns a 2D submesh of DeviceMesh:([[0, 1], [4, 5]]). 2025-09-07T06:57:11.6161708Z Calling mesh_3d["dp", "cp"] on rank 2, 3, 6, 7 returns a 2D submesh of DeviceMesh:([[2, 3], [6, 7]]). 2025-09-07T06:57:11.6162500Z Calling mesh_3d["cp", "dp"] on rank 0, 1, 4, 5 returns a 2D submesh of DeviceMesh:([[0, 4], [1, 5]]). 2025-09-07T06:57:11.6163216Z Calling mesh_3d["cp", "dp"] on rank 2, 3, 6, 7 returns a 2D submesh of DeviceMesh:([[2, 6], [3, 7]]). 2025-09-07T06:57:11.6163552Z 2025-09-07T06:57:11.6163642Z Example:: 2025-09-07T06:57:11.6163746Z 2025-09-07T06:57:11.6163844Z >>> # xdoctest: +SKIP("no rank") 2025-09-07T06:57:11.6164159Z >>> from torch.distributed.device_mesh import DeviceMesh 2025-09-07T06:57:11.6164461Z >>> 2025-09-07T06:57:11.6164739Z >>> # Initialize a 2D device mesh as (2, 4) to represent the topology 2025-09-07T06:57:11.6165124Z >>> # of cross-host(dim 0), and within-host (dim 1). 2025-09-07T06:57:11.6165545Z >>> mesh_2d = init_device_mesh(device_type="cuda", (2,4), mesh_dim_names=("dp", "tp")) 2025-09-07T06:57:11.6165945Z >>> tp_mesh = mesh_2d["tp"] 2025-09-07T06:57:11.6166185Z >>> dp_mesh = mesh_2d["dp"] 2025-09-07T06:57:11.6166402Z >>> 2025-09-07T06:57:11.6166581Z >>> # Initialize a 3D mesh. 2025-09-07T06:57:11.6166961Z >>> mesh_3d = init_device_mesh(device_type="cuda", (2,2,2), mesh_dim_names=("dp", "pp", "cp")) 2025-09-07T06:57:11.6167543Z >>> # The order of the mesh_dim_names provided deteremines the order of dimensions in the submesh. 2025-09-07T06:57:11.6167987Z >>> dp_cp_mesh = mesh_3d["dp", "cp"] 2025-09-07T06:57:11.6168260Z >>> cp_dp_mesh = mesh_3d["cp", "dp"] 2025-09-07T06:57:11.6168432Z 2025-09-07T06:57:11.6168995Z Original Error: SyntaxError('positional argument follows keyword argument', ('', 6, 82, 'mesh_2d = init_device_mesh(device_type="cuda", (2,4), mesh_dim_names=("dp", "tp"))\n', 6, 83)) 2025-09-07T06:57:11.6169643Z 2025-09-07T06:57:11.6169856Z mesh_2d = init_device_mesh(device_type="cuda", (2,4), mesh_dim_names=("dp", "tp")) 2025-09-07T06:57:11.6170344Z ^ 2025-09-07T06:57:11.7737122Z msg = Cannot scrape callname=SavePlanner in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py line=122. 2025-09-07T06:57:11.7738497Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:11.7739026Z 2025-09-07T06:57:11.7739666Z Abstract class defining the protocol used by save_state_dict to plan the save process. 2025-09-07T06:57:11.7740403Z 2025-09-07T06:57:11.7741271Z SavePlanners are stateful objects that can be used to customize the whole save process. 2025-09-07T06:57:11.7742099Z 2025-09-07T06:57:11.7742930Z SavePlanner acts as an access proxy to the state_dict, so any transformation done to it 2025-09-07T06:57:11.7744011Z will be visible to the whole process. 2025-09-07T06:57:11.7744419Z 2025-09-07T06:57:11.7745218Z A planner subclass can expect the following sequence of calls during save_state_dict: 2025-09-07T06:57:11.7745652Z 2025-09-07T06:57:11.7745886Z 1) set_up_planner - called on all ranks. 2025-09-07T06:57:11.7746289Z Signals the start of a checkpoint save. 2025-09-07T06:57:11.7746581Z 2025-09-07T06:57:11.7746856Z 2) create_local_plan - called on all ranks. 2025-09-07T06:57:11.7747431Z Process the state_dict and produces a `SavePlan` that will be sent for global planning. 2025-09-07T06:57:11.7747819Z 2025-09-07T06:57:11.7748115Z 3) create_global_plan - called on the coordinator rank only. 2025-09-07T06:57:11.7748701Z Takes the SavePlan from all ranks and make any global decision. 2025-09-07T06:57:11.7749002Z 2025-09-07T06:57:11.7749183Z 4) finish_plan - called on all ranks. 2025-09-07T06:57:11.7749750Z This gives each rank a chance to adjust to global planning decisions. 2025-09-07T06:57:11.7750159Z 2025-09-07T06:57:11.7750377Z 5) resolve_data - called multiple times on each rank 2025-09-07T06:57:11.7750866Z Lookups a value on the `state_dict` for the storage layer to write. 2025-09-07T06:57:11.7751224Z 2025-09-07T06:57:11.7751598Z Users are recommended to extend DefaultSavePlanner instead of this interface directly as 2025-09-07T06:57:11.7752193Z most changes can be expressed by changes in a single method. 2025-09-07T06:57:11.7763043Z 2025-09-07T06:57:11.7763238Z There are 3 usual patterns of extension: 2025-09-07T06:57:11.7763638Z 2025-09-07T06:57:11.7763901Z Rewriting state_dict. This is the simplest way to extend the save process as it 2025-09-07T06:57:11.7764414Z doesn't requite understanding the intrincacies of how SavePlan works: 2025-09-07T06:57:11.7764681Z 2025-09-07T06:57:11.7764789Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:11.7765100Z >>> class RenamePlanner(DefaultSavePlanner): 2025-09-07T06:57:11.7765395Z >>> def set_up_planner( 2025-09-07T06:57:11.7765616Z >>> self, 2025-09-07T06:57:11.7765821Z >>> state_dict: STATE_DICT_TYPE, 2025-09-07T06:57:11.7766101Z >>> storage_meta: Optional[StorageMeta], 2025-09-07T06:57:11.7766383Z >>> is_coordinator: bool, 2025-09-07T06:57:11.7766625Z >>> ) -> None: 2025-09-07T06:57:11.7766852Z >>> # prefix all keys with `foo_`` 2025-09-07T06:57:11.7767292Z >>> super().set_up_planner({"foo_" + k: v for k, v in state_dict.items()}, storage_meta, is_coordinator) 2025-09-07T06:57:11.7767636Z 2025-09-07T06:57:11.7767929Z Modifying local plan and lookup in tandem. This is useful when fine control of how data is persisted 2025-09-07T06:57:11.7768309Z 2025-09-07T06:57:11.7768412Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:11.7768708Z >>> class FP16Planner(DefaultSavePlanner): 2025-09-07T06:57:11.7768999Z >>> def create_local_plan(self): 2025-09-07T06:57:11.7769277Z >>> plan = super().create_local_plan() 2025-09-07T06:57:11.7769563Z >>> for p in plan: 2025-09-07T06:57:11.7769810Z >>> if p.tensor_data is not None: 2025-09-07T06:57:11.7770146Z >>> p.tensor_data.properties.dtype = torch.float16 2025-09-07T06:57:11.7770594Z >>> return plan 2025-09-07T06:57:11.7770803Z >>> 2025-09-07T06:57:11.7771002Z >>> def resolve_data(self, write_item): 2025-09-07T06:57:11.7771296Z >>> item = super().resolve_data(write_item) 2025-09-07T06:57:11.7771723Z >>> return item if write_item.type == WriteItemType.BYTE_IO else item.to(torch.float16) 2025-09-07T06:57:11.7772042Z 2025-09-07T06:57:11.7772411Z Using the global planning step to make central decisions that can't be made individually by each rank 2025-09-07T06:57:11.7772864Z 2025-09-07T06:57:11.7772973Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:11.7773258Z >>> from itertools import zip_longest 2025-09-07T06:57:11.7773530Z >>> from dataclasses import replace 2025-09-07T06:57:11.7774092Z >>> class DDPLoadBalancingPlanner(DefaultSavePlanner): 2025-09-07T06:57:11.7774574Z >>> # This uses the default local plan behavior of having all non-sharded writes in rank 0 2025-09-07T06:57:11.7775017Z >>> # This sample doesn't handle ShardedTensors 2025-09-07T06:57:11.7775318Z >>> def create_global_plan(self, all_plans): 2025-09-07T06:57:11.7775646Z >>> iters = [iter(all_plans[0].items)] * len(all_plans) 2025-09-07T06:57:11.7775953Z >>> items_per_rank = [ 2025-09-07T06:57:11.7776224Z >>> [item for item in items if item is not None] 2025-09-07T06:57:11.7776565Z >>> for items in zip(*zip_longest(*iters), strict=True) 2025-09-07T06:57:11.7776866Z >>> ] 2025-09-07T06:57:11.7777060Z >>> all_plans = [ 2025-09-07T06:57:11.7777293Z >>> replace(plan, items=items) 2025-09-07T06:57:11.7777632Z >>> for plan, items in zip(all_plans, items_per_rank, strict=True) 2025-09-07T06:57:11.7777964Z >>> ] 2025-09-07T06:57:11.7778193Z >>> return super().create_global_plan(all_plans) 2025-09-07T06:57:11.7778403Z 2025-09-07T06:57:11.7778629Z Finally, some planners need to save additional metadata in the checkpoint, this is 2025-09-07T06:57:11.7779168Z accomplished by having each rank contribute their data items in the local plan and 2025-09-07T06:57:11.7779577Z the global planner aggregate them: 2025-09-07T06:57:11.7779757Z 2025-09-07T06:57:11.7779850Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:11.7780169Z >>> class SaveExtraDataPlanner(DefaultSavePlanner): 2025-09-07T06:57:11.7780607Z >>> def create_local_plan(self) -> SavePlan: 2025-09-07T06:57:11.7780912Z >>> plan = super().create_local_plan() 2025-09-07T06:57:11.7781243Z >>> return replace(plan, planner_data="per-rank-data") 2025-09-07T06:57:11.7781549Z >>> 2025-09-07T06:57:11.7781896Z >>> def create_global_plan(self, all_plans: List[SavePlan]) -> Tuple[List[SavePlan], Metadata]: 2025-09-07T06:57:11.7782414Z >>> global_plan, metadata = super().create_global_plan(all_plans) 2025-09-07T06:57:11.7782807Z >>> merged_data = [p.planner_data for p in global_plan] 2025-09-07T06:57:11.7783177Z >>> metadata = replace(metadata, planner_data=merged_data) 2025-09-07T06:57:11.7783507Z >>> return global_plan, metadata 2025-09-07T06:57:11.7783691Z 2025-09-07T06:57:11.7784136Z Original Error: IndentationError('expected an indented block after function definition on line 3', ('', 9, 0, '_._ = None\n', 9, -1)) 2025-09-07T06:57:11.7784659Z 2025-09-07T06:57:11.7784729Z _._ = None 2025-09-07T06:57:11.7784899Z ^ 2025-09-07T06:57:11.7785521Z msg = Cannot scrape callname=LoadPlanner in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py line=305. 2025-09-07T06:57:11.7786305Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:11.7786610Z 2025-09-07T06:57:11.7786847Z Abstract class defining the protocol used by load_state_dict to plan the load process. 2025-09-07T06:57:11.7787174Z 2025-09-07T06:57:11.7787394Z LoadPlanner are stateful objects that can be used to customize the whole load process. 2025-09-07T06:57:11.7787709Z 2025-09-07T06:57:11.7788024Z LoadPlanner acts as an access proxy to the state_dict, so any transformation done to it 2025-09-07T06:57:11.7788443Z will be visible to the whole process. 2025-09-07T06:57:11.7788637Z 2025-09-07T06:57:11.7788871Z A planner subclass can expect the following sequence of calls during load_state_dict: 2025-09-07T06:57:11.7789205Z 2025-09-07T06:57:11.7789310Z 1) set_up_planner - called on all ranks. 2025-09-07T06:57:11.7789696Z Signals the start of loading a checkpoint. 2025-09-07T06:57:11.7789979Z 2025-09-07T06:57:11.7790094Z 2) create_local_plan - called on all ranks. 2025-09-07T06:57:11.7790527Z Process the state_dict and produces a `LoadPlan` that will be sent for global planning. 2025-09-07T06:57:11.7790865Z 2025-09-07T06:57:11.7791030Z 3) create_global_plan - called on the coordinator rank only. 2025-09-07T06:57:11.7791436Z Takes the LoadPlan from all ranks and make any global decision. 2025-09-07T06:57:11.7791698Z 2025-09-07T06:57:11.7791822Z 4) load_bytes - called multiple times on each rank 2025-09-07T06:57:11.7792183Z This is called once per non-tensor value in state_dict. 2025-09-07T06:57:11.7792405Z 2025-09-07T06:57:11.7792597Z 5) resolve_tensor and commit_tensor - called multiple times on each rank 2025-09-07T06:57:11.7793035Z They are called in pair for each Tensor value in state_dict. 2025-09-07T06:57:11.7793281Z 2025-09-07T06:57:11.7793535Z Users are recommended to extend DefaultLoadPlanner instead of this interface directly as 2025-09-07T06:57:11.7794031Z most changes can be expressed by changes in a single method. 2025-09-07T06:57:11.7794271Z 2025-09-07T06:57:11.7794375Z There are two usual patterns of extension: 2025-09-07T06:57:11.7794565Z 2025-09-07T06:57:11.7794773Z Rewriting state_dict. This is the simplest way to extend the load process as it 2025-09-07T06:57:11.7795277Z doesn't requite understanding the intrincacies of how LoadPlan works. We need 2025-09-07T06:57:11.7795762Z to keep a reference to the original state_dict as load happens in place so 2025-09-07T06:57:11.7796144Z we need to be able to perform it in place 2025-09-07T06:57:11.7796335Z 2025-09-07T06:57:11.7796435Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:11.7796736Z >>> class RenamePlanner(DefaultLoadPlanner): 2025-09-07T06:57:11.7797019Z >>> def set_up_planner( 2025-09-07T06:57:11.7797245Z >>> self, 2025-09-07T06:57:11.7797535Z >>> state_dict: STATE_DICT_TYPE, 2025-09-07T06:57:11.7797807Z >>> metadata: Metadata, 2025-09-07T06:57:11.7798054Z >>> is_coordinator: bool, 2025-09-07T06:57:11.7798295Z >>> ) -> None: 2025-09-07T06:57:11.7798511Z >>> self.original_state_dict = state_dict 2025-09-07T06:57:11.7798858Z >>> state_dict = {"foo_" + k: v for k, v in state_dict.items()} 2025-09-07T06:57:11.7799168Z >>> 2025-09-07T06:57:11.7799361Z >>> if self.flatten_sharded_tensors: 2025-09-07T06:57:11.7799675Z >>> state_dict = _flatten_sharded_tensors(state_dict) 2025-09-07T06:57:11.7799967Z >>> 2025-09-07T06:57:11.7800149Z >>> if self.flatten_state_dict: 2025-09-07T06:57:11.7800481Z >>> state_dict, self.mappings = flatten_state_dict(state_dict) 2025-09-07T06:57:11.7800798Z >>> 2025-09-07T06:57:11.7800978Z >>> self.state_dict = state_dict 2025-09-07T06:57:11.7801246Z >>> self.metadata = metadata 2025-09-07T06:57:11.7801512Z >>> self.is_coordinator = is_coordinator 2025-09-07T06:57:11.7801779Z >>> 2025-09-07T06:57:11.7801974Z >>> def load_bytes(self, read_item, value): 2025-09-07T06:57:11.7802254Z >>> # Remove the "foo_" prefix 2025-09-07T06:57:11.7802674Z >>> self.original_state_dict[read_item.dest_index.fqn[4:]] = torch.load(value, weights_only=False) 2025-09-07T06:57:11.7803039Z 2025-09-07T06:57:11.7803043Z 2025-09-07T06:57:11.7803255Z Modifying resolve_tensor and commit_tensor to handle load time transformation. 2025-09-07T06:57:11.7803572Z 2025-09-07T06:57:11.7803664Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:11.7803972Z >>> class MetaModelMaterialize(DefaultSavePlanner): 2025-09-07T06:57:11.7804371Z >>> def resolve_tensor(self, read_item): 2025-09-07T06:57:11.7804663Z >>> tensor = super().resolve_tensor(read_item) 2025-09-07T06:57:11.7804985Z >>> return torch.empty_like(tensor, device="cpu") 2025-09-07T06:57:11.7805263Z >>> 2025-09-07T06:57:11.7805462Z >>> def commit_tensor(self, read_item, tensor): 2025-09-07T06:57:11.7805787Z >>> self.state_dict[read_item.dest_index.fqn] = tensor 2025-09-07T06:57:11.7806154Z 2025-09-07T06:57:11.7806604Z Original Error: IndentationError('expected an indented block after function definition on line 22', ('', 23, 0, '_._ = None\n', 23, -1)) 2025-09-07T06:57:11.7807136Z 2025-09-07T06:57:11.7807205Z _._ = None 2025-09-07T06:57:11.7807375Z ^ 2025-09-07T06:57:12.0685818Z msg = Cannot scrape callname=FullStateDictConfig in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/api.py line=295. 2025-09-07T06:57:12.0687163Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:12.0687694Z 2025-09-07T06:57:12.0687995Z ``FullStateDictConfig`` is a config class meant to be used with 2025-09-07T06:57:12.0688715Z ``StateDictType.FULL_STATE_DICT``. We recommend enabling both 2025-09-07T06:57:12.0689428Z ``offload_to_cpu=True`` and ``rank0_only=True`` when saving full state 2025-09-07T06:57:12.0690177Z dicts to save GPU memory and CPU memory, respectively. This config class 2025-09-07T06:57:12.0690923Z is meant to be used via the :func:`state_dict_type` context manager as 2025-09-07T06:57:12.0691513Z follows: 2025-09-07T06:57:12.0691704Z 2025-09-07T06:57:12.0691893Z >>> # xdoctest: +SKIP("undefined variables") 2025-09-07T06:57:12.0692575Z >>> from torch.distributed.fsdp import FullyShardedDataParallel as FSDP 2025-09-07T06:57:12.0693238Z >>> fsdp = FSDP(model, auto_wrap_policy=...) 2025-09-07T06:57:12.0693993Z >>> cfg = FullStateDictConfig(offload_to_cpu=True, rank0_only=True) 2025-09-07T06:57:12.0694736Z >>> with FSDP.state_dict_type(fsdp, StateDictType.FULL_STATE_DICT, cfg): 2025-09-07T06:57:12.0695362Z >>> state = fsdp.state_dict() 2025-09-07T06:57:12.0695965Z >>> # `state` will be empty on non rank 0 and contain CPU tensors on rank 0. 2025-09-07T06:57:12.0696789Z >>> # To reload checkpoint for inference, finetuning, transfer learning, etc: 2025-09-07T06:57:12.0698031Z >>> model = model_fn() # Initialize model in preparation for wrapping with FSDP 2025-09-07T06:57:12.0698803Z >>> if dist.get_rank() == 0: 2025-09-07T06:57:12.0699457Z >>> # Load checkpoint only on rank 0 to avoid memory redundancy 2025-09-07T06:57:12.0700227Z >>> state_dict = torch.load("my_checkpoint.pt") 2025-09-07T06:57:12.0700837Z >>> model.load_state_dict(state_dict) 2025-09-07T06:57:12.0701291Z >>> # All ranks initialize FSDP module as usual. `sync_module_states` argument 2025-09-07T06:57:12.0701777Z >>> # communicates loaded checkpoint states from rank 0 to rest of the world. 2025-09-07T06:57:12.0702156Z >>> fsdp = FSDP( 2025-09-07T06:57:12.0702369Z ... model, 2025-09-07T06:57:12.0702607Z ... device_id=torch.cuda.current_device(), 2025-09-07T06:57:12.0702903Z ... auto_wrap_policy=..., 2025-09-07T06:57:12.0703168Z ... sync_module_states=True, 2025-09-07T06:57:12.0703421Z ... ) 2025-09-07T06:57:12.0703701Z >>> # After this point, all ranks have FSDP model with loaded checkpoint. 2025-09-07T06:57:12.0703985Z 2025-09-07T06:57:12.0704069Z Attributes: 2025-09-07T06:57:12.0704345Z rank0_only (bool): If ``True``, then only rank 0 saves the full state 2025-09-07T06:57:12.0704785Z dict, and nonzero ranks save an empty dict. If ``False``, then all 2025-09-07T06:57:12.0705182Z ranks save the full state dict. (Default: ``False``) 2025-09-07T06:57:12.0705418Z 2025-09-07T06:57:12.0705835Z Original Error: IndentationError("expected an indented block after 'if' statement on line 10", ('', 11, 1, '_._ = None\n', 11, 2)) 2025-09-07T06:57:12.0706435Z 2025-09-07T06:57:12.0706510Z _._ = None 2025-09-07T06:57:12.0706711Z ^ 2025-09-07T06:57:12.6643266Z msg = Cannot scrape callname=unsafe_generate_fake_kernels in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/fake_profile.py line=94. 2025-09-07T06:57:12.6644648Z Caused by: DoctestParseError('Failed to parse doctest in _label_docsrc_lines') 2025-09-07T06:57:12.6645191Z 2025-09-07T06:57:12.6645817Z Registers a fake kernel based on the given operator profiles. This fake 2025-09-07T06:57:12.6646793Z kernel registration will override any existing fake kernel registrations. 2025-09-07T06:57:12.6647298Z 2025-09-07T06:57:12.6647603Z The input is a dictionary mapping operator names to a set of operator 2025-09-07T06:57:12.6648473Z profiles, which we will use to generate fake kernels. The operator profiles 2025-09-07T06:57:12.6649368Z are a record of the input and output tensor metadata. Based on this 2025-09-07T06:57:12.6650254Z information we will match a given input to the recorded profile, and return 2025-09-07T06:57:12.6651203Z an output with the same metadata as in the recorded profile. If a profile 2025-09-07T06:57:12.6652021Z doesn't exist then an exception will be thrown. 2025-09-07T06:57:12.6652288Z 2025-09-07T06:57:12.6652513Z The fake kernel generation is considered unsafe because it relies on the 2025-09-07T06:57:12.6652989Z rigid, pre-defined operator profiles that do not account for potential 2025-09-07T06:57:12.6653459Z variations in output behavior. Specifically, the generated kernels assume a 2025-09-07T06:57:12.6654059Z fixed relationship between input and output ranks. However, in reality, it's 2025-09-07T06:57:12.6654548Z possible that data-dependent operations may produce outputs of different 2025-09-07T06:57:12.6655005Z ranks even when given inputs of the same rank. The generated fake kernels 2025-09-07T06:57:12.6655443Z are inflexible and unable to accommodate these nuances, making them 2025-09-07T06:57:12.6655789Z potentially unsafe. 2025-09-07T06:57:12.6655917Z 2025-09-07T06:57:12.6655986Z Args: 2025-09-07T06:57:12.6656256Z op_profiles (dict[str, set[OpProfile]]): A dictionary mapping operator 2025-09-07T06:57:12.6656682Z name to a set of operator profiles from which we will generate fake 2025-09-07T06:57:12.6656998Z kernels. 2025-09-07T06:57:12.6657113Z 2025-09-07T06:57:12.6657185Z Examples: 2025-09-07T06:57:12.6657396Z 2025-09-07T06:57:12.6657546Z >>> # Example: Registering an op-profile from draft-export 2025-09-07T06:57:12.6657850Z >>> import torch 2025-09-07T06:57:12.6658109Z >>> from torch.export._draft_export import draft_export 2025-09-07T06:57:12.6658396Z >>> 2025-09-07T06:57:12.6658650Z >>> @torch.library.custom_op("mylib::foo", mutates_args=()) 2025-09-07T06:57:12.6658997Z >>> def foo(x: Tensor, y: Tensor) -> Tensor: 2025-09-07T06:57:12.6659263Z >>> return x + y 2025-09-07T06:57:12.6659458Z >>> 2025-09-07T06:57:12.6659642Z >>> class M(torch.nn.Module): 2025-09-07T06:57:12.6659892Z >>> def forward(self, a, b): 2025-09-07T06:57:12.6660174Z >>> res = torch.ops.mylib.foo(a, b) # no fake impl 2025-09-07T06:57:12.6660454Z >>> return res 2025-09-07T06:57:12.6660662Z >>> 2025-09-07T06:57:12.6660902Z >>> ep = draft_export(M(), (torch.ones(3, 4), torch.ones(3, 4)) 2025-09-07T06:57:12.6661197Z >>> 2025-09-07T06:57:12.6661530Z >>> with torch._library.fake_profile.unsafe_generate_fake_kernels(ep._report.op_profiles): 2025-09-07T06:57:12.6661970Z >>> decomp = ep.run_decompositions() 2025-09-07T06:57:12.6662152Z 2025-09-07T06:57:12.6662156Z 2025-09-07T06:57:12.6662544Z Original Error: IncompleteParseError('ill-formed doctest: all parts have been processed but the doctest source is not balanced') 2025-09-07T06:57:12.6663000Z 2025-09-07T06:57:12.6810105Z msg = Cannot scrape callname=CustomOpDef.register_fake in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py line=397. 2025-09-07T06:57:12.6811601Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:12.6812336Z Register a FakeTensor implementation for this custom op. 2025-09-07T06:57:12.6812722Z 2025-09-07T06:57:12.6813066Z This is necessary to get the operator to work efficiently with torch.compile. 2025-09-07T06:57:12.6813555Z 2025-09-07T06:57:12.6813979Z The Fake impl (sometimes also known as a meta kernel or abstract impl) 2025-09-07T06:57:12.6814949Z specifies the behavior of this operator on Tensors that carry no data. 2025-09-07T06:57:12.6815757Z Given some input Tensors with certain properties 2025-09-07T06:57:12.6816459Z (sizes/strides/storage_offset/device), it specifies what the properties of 2025-09-07T06:57:12.6817103Z the output Tensors are. 2025-09-07T06:57:12.6817361Z 2025-09-07T06:57:12.6817653Z Please see :func:`torch.library.register_fake` for more details. 2025-09-07T06:57:12.6818084Z 2025-09-07T06:57:12.6818229Z Args: 2025-09-07T06:57:12.6818729Z fn (Callable): The function to register as the FakeTensor 2025-09-07T06:57:12.6819349Z implementation. 2025-09-07T06:57:12.6819651Z 2025-09-07T06:57:12.6819796Z Examples: 2025-09-07T06:57:12.6820186Z >>> import torch 2025-09-07T06:57:12.6820644Z >>> import numpy as np 2025-09-07T06:57:12.6821150Z >>> from torch import Tensor 2025-09-07T06:57:12.6821634Z >>> 2025-09-07T06:57:12.6822186Z >>> # Example 1: an operator without data-dependent output shape 2025-09-07T06:57:12.6822743Z >>> @torch.library.custom_op("mylib::linear", mutates_args=()) 2025-09-07T06:57:12.6823162Z >>> def linear(x: Tensor, weight: Tensor, bias: Tensor) -> Tensor: 2025-09-07T06:57:12.6823505Z >>> return (x @ weight.t()) + bias 2025-09-07T06:57:12.6823754Z >>> 2025-09-07T06:57:12.6823946Z >>> @linear.register_fake 2025-09-07T06:57:12.6824200Z >>> def _(x, weight, bias): 2025-09-07T06:57:12.6824444Z >>> assert x.dim() == 2 2025-09-07T06:57:12.6824697Z >>> assert weight.dim() == 2 2025-09-07T06:57:12.6824956Z >>> assert bias.dim() == 1 2025-09-07T06:57:12.6825235Z >>> assert x.shape[1] == weight.shape[1] 2025-09-07T06:57:12.6825622Z >>> assert weight.shape[0] == bias.shape[0] 2025-09-07T06:57:12.6825926Z >>> assert x.device == weight.device 2025-09-07T06:57:12.6826234Z >>> return x.new_empty(x.size(0), weight.size(0)) 2025-09-07T06:57:12.6826509Z >>> 2025-09-07T06:57:12.6826705Z >>> x = torch.randn(2, 2) 2025-09-07T06:57:12.6826951Z >>> weight = torch.randn(2, 2) 2025-09-07T06:57:12.6827200Z >>> bias = torch.randn(2) 2025-09-07T06:57:12.6827466Z >>> # xdoctest: +SKIP("Requires Python <= 3.11") 2025-09-07T06:57:12.6827818Z >>> out = torch.compile(linear, fullgraph=True)(x, weight, bias) 2025-09-07T06:57:12.6828166Z >>> # xdoctest: +SKIP("Requires Python <= 3.11") 2025-09-07T06:57:12.6828559Z >>> assert torch.allclose(out, torch.nn.functional.linear(x, weight, bias)) 2025-09-07T06:57:12.6828915Z >>> 2025-09-07T06:57:12.6829166Z >>> # Example 2: an operator with data-dependent output shape 2025-09-07T06:57:12.6829560Z >>> @torch.library.custom_op("mylib::nonzero", mutates_args=()) 2025-09-07T06:57:12.6829911Z >>> def nonzero(x: Tensor) -> Tensor: 2025-09-07T06:57:12.6830181Z >>> x_np = x.cpu().numpy() 2025-09-07T06:57:12.6830453Z >>> res = np.stack(np.nonzero(x_np), axis=1) 2025-09-07T06:57:12.6830754Z >>> return torch.tensor(res, device=x.device) 2025-09-07T06:57:12.6831016Z >>> 2025-09-07T06:57:12.6831206Z >>> @nonzero.register_fake 2025-09-07T06:57:12.6831449Z >>> def _(x): 2025-09-07T06:57:12.6831703Z >>> # Number of nonzero-elements is data-dependent. 2025-09-07T06:57:12.6832158Z >>> # Since we cannot peek at the data in an abstract impl, 2025-09-07T06:57:12.6832506Z >>> # we use the ctx object to construct a new symint that 2025-09-07T06:57:12.6832828Z >>> # represents the data-dependent size. 2025-09-07T06:57:12.6833120Z >>> ctx = torch.library.get_ctx() 2025-09-07T06:57:12.6833468Z >>> nnz = ctx.new_dynamic_size() 2025-09-07T06:57:12.6833792Z >>> shape = [nnz, x.dim()] 2025-09-07T06:57:12.6834083Z >>> result = x.new_empty(shape, dtype=torch.int64) 2025-09-07T06:57:12.6834376Z >>> return result 2025-09-07T06:57:12.6834593Z >>> 2025-09-07T06:57:12.6834791Z >>> x = torch.tensor([0, 1, 2, 0, 0, 1]) 2025-09-07T06:57:12.6835079Z >>> # xdoctest: +SKIP("Requires Python <= 3.11") 2025-09-07T06:57:12.6835397Z >>> out = torch.compile(nonzero, fullgraph=True)(x) 2025-09-07T06:57:12.6835708Z >>> # xdoctest: +SKIP("Requires Python <= 3.11") 2025-09-07T06:57:12.6836004Z >>> assert torch.allclose(out, x.nonzero()) 2025-09-07T06:57:12.6836198Z 2025-09-07T06:57:12.6836266Z 2025-09-07T06:57:12.6836774Z Original Error: IndentationError('expected an indented block after function definition on line 36', ('', 37, 1, '_._ = None\n', 37, 2)) 2025-09-07T06:57:12.6837266Z 2025-09-07T06:57:12.6837338Z _._ = None 2025-09-07T06:57:12.6837504Z ^ 2025-09-07T06:57:15.4140496Z msg = Cannot scrape callname=vmap in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/apis.py line=39. 2025-09-07T06:57:15.4141740Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:15.4142256Z 2025-09-07T06:57:15.4142571Z vmap is the vectorizing map; ``vmap(func)`` returns a new function that 2025-09-07T06:57:15.4143303Z maps ``func`` over some dimension of the inputs. Semantically, vmap 2025-09-07T06:57:15.4144061Z pushes the map into PyTorch operations called by ``func``, effectively 2025-09-07T06:57:15.4144675Z vectorizing those operations. 2025-09-07T06:57:15.4144955Z 2025-09-07T06:57:15.4145254Z vmap is useful for handling batch dimensions: one can write a function 2025-09-07T06:57:15.4145974Z ``func`` that runs on examples and then lift it to a function that can 2025-09-07T06:57:15.4147081Z take batches of examples with ``vmap(func)``. vmap can also be used to 2025-09-07T06:57:15.4147809Z compute batched gradients when composed with autograd. 2025-09-07T06:57:15.4148188Z 2025-09-07T06:57:15.4148339Z .. note:: 2025-09-07T06:57:15.4148777Z :func:`torch.vmap` is aliased to :func:`torch.func.vmap` for 2025-09-07T06:57:15.4149384Z convenience. Use whichever one you'd like. 2025-09-07T06:57:15.4149722Z 2025-09-07T06:57:15.4149849Z Args: 2025-09-07T06:57:15.4150409Z func (function): A Python function that takes one or more arguments. 2025-09-07T06:57:15.4151202Z Must return one or more Tensors. 2025-09-07T06:57:15.4151903Z in_dims (int or nested structure): Specifies which dimension of the 2025-09-07T06:57:15.4152707Z inputs should be mapped over. ``in_dims`` should have a 2025-09-07T06:57:15.4153510Z structure like the inputs. If the ``in_dim`` for a particular 2025-09-07T06:57:15.4154227Z input is None, then that indicates there is no map dimension. 2025-09-07T06:57:15.4154607Z Default: 0. 2025-09-07T06:57:15.4154886Z out_dims (int or Tuple[int]): Specifies where the mapped dimension 2025-09-07T06:57:15.4155289Z should appear in the outputs. If ``out_dims`` is a Tuple, then 2025-09-07T06:57:15.4155662Z it should have one element per output. Default: 0. 2025-09-07T06:57:15.4156031Z randomness (str): Specifies whether the randomness in this 2025-09-07T06:57:15.4156439Z vmap should be the same or different across batches. If 'different', 2025-09-07T06:57:15.4156858Z the randomness for each batch will be different. If 'same', the 2025-09-07T06:57:15.4157410Z randomness will be the same across batches. If 'error', any calls to 2025-09-07T06:57:15.4157839Z random functions will error. Default: 'error'. WARNING: this flag 2025-09-07T06:57:15.4158259Z only applies to random PyTorch operations and does not apply to 2025-09-07T06:57:15.4158617Z Python's random module or numpy randomness. 2025-09-07T06:57:15.4159104Z chunk_size (None or int): If None (default), apply a single vmap over inputs. 2025-09-07T06:57:15.4159659Z If not None, then compute the vmap :attr:`chunk_size` samples at a time. 2025-09-07T06:57:15.4160128Z Note that :attr:`chunk_size=1` is equivalent to computing the vmap with a for-loop. 2025-09-07T06:57:15.4160634Z If you run into memory issues computing the vmap, please try a non-None chunk_size. 2025-09-07T06:57:15.4160931Z 2025-09-07T06:57:15.4161006Z Returns: 2025-09-07T06:57:15.4161258Z Returns a new "batched" function. It takes the same inputs as 2025-09-07T06:57:15.4161650Z ``func``, except each input has an extra dimension at the index 2025-09-07T06:57:15.4162042Z specified by ``in_dims``. It takes returns the same outputs as 2025-09-07T06:57:15.4162427Z ``func``, except each output has an extra dimension at the index 2025-09-07T06:57:15.4162749Z specified by ``out_dims``. 2025-09-07T06:57:15.4162902Z 2025-09-07T06:57:15.4162975Z .. warning: 2025-09-07T06:57:15.4163241Z :func:`vmap` works best with functional-style code. Please do not 2025-09-07T06:57:15.4163640Z perform any side-effects in ``func``, with the exception of 2025-09-07T06:57:15.4164063Z in-place PyTorch operations. Examples of side-effects include mutating 2025-09-07T06:57:15.4164521Z Python data structures and assigning values to variables not captured 2025-09-07T06:57:15.4164859Z in ``func``. 2025-09-07T06:57:15.4164972Z 2025-09-07T06:57:15.4165165Z One example of using :func:`vmap` is to compute batched dot products. PyTorch 2025-09-07T06:57:15.4165624Z doesn't provide a batched ``torch.dot`` API; instead of unsuccessfully 2025-09-07T06:57:15.4166059Z rummaging through docs, use :func:`vmap` to construct a new function. 2025-09-07T06:57:15.4166319Z 2025-09-07T06:57:15.4166421Z >>> torch.dot # [D], [D] -> [] 2025-09-07T06:57:15.4166743Z >>> batched_dot = torch.func.vmap(torch.dot) # [N, D], [N, D] -> [N] 2025-09-07T06:57:15.4167185Z >>> x, y = torch.randn(2, 5), torch.randn(2, 5) 2025-09-07T06:57:15.4167467Z >>> batched_dot(x, y) 2025-09-07T06:57:15.4167606Z 2025-09-07T06:57:15.4167791Z :func:`vmap` can be helpful in hiding batch dimensions, leading to a simpler 2025-09-07T06:57:15.4168144Z model authoring experience. 2025-09-07T06:57:15.4168291Z 2025-09-07T06:57:15.4168382Z >>> batch_size, feature_size = 3, 5 2025-09-07T06:57:15.4168691Z >>> weights = torch.randn(feature_size, requires_grad=True) 2025-09-07T06:57:15.4168983Z >>> 2025-09-07T06:57:15.4169161Z >>> def model(feature_vec): 2025-09-07T06:57:15.4169413Z >>> # Very simple linear model with activation 2025-09-07T06:57:15.4169706Z >>> return feature_vec.dot(weights).relu() 2025-09-07T06:57:15.4169959Z >>> 2025-09-07T06:57:15.4170179Z >>> examples = torch.randn(batch_size, feature_size) 2025-09-07T06:57:15.4170491Z >>> result = torch.vmap(model)(examples) 2025-09-07T06:57:15.4170676Z 2025-09-07T06:57:15.4170894Z :func:`vmap` can also help vectorize computations that were previously difficult 2025-09-07T06:57:15.4171376Z or impossible to batch. One example is higher-order gradient computation. 2025-09-07T06:57:15.4171843Z The PyTorch autograd engine computes vjps (vector-Jacobian products). 2025-09-07T06:57:15.4172304Z Computing a full Jacobian matrix for some function f: R^N -> R^N usually 2025-09-07T06:57:15.4172788Z requires N calls to ``autograd.grad``, one per Jacobian row. Using :func:`vmap`, 2025-09-07T06:57:15.4173286Z we can vectorize the whole computation, computing the Jacobian in a single 2025-09-07T06:57:15.4173786Z call to ``autograd.grad``. 2025-09-07T06:57:15.4174076Z 2025-09-07T06:57:15.4174150Z >>> # Setup 2025-09-07T06:57:15.4174348Z >>> N = 5 2025-09-07T06:57:15.4174546Z >>> f = lambda x: x**2 2025-09-07T06:57:15.4174785Z >>> x = torch.randn(N, requires_grad=True) 2025-09-07T06:57:15.4175045Z >>> y = f(x) 2025-09-07T06:57:15.4175228Z >>> I_N = torch.eye(N) 2025-09-07T06:57:15.4175433Z >>> 2025-09-07T06:57:15.4175716Z >>> # Sequential approach 2025-09-07T06:57:15.4176132Z >>> jacobian_rows = [torch.autograd.grad(y, x, v, retain_graph=True)[0] 2025-09-07T06:57:15.4176499Z >>> for v in I_N.unbind()] 2025-09-07T06:57:15.4176776Z >>> jacobian = torch.stack(jacobian_rows) 2025-09-07T06:57:15.4177028Z >>> 2025-09-07T06:57:15.4177228Z >>> # vectorized gradient computation 2025-09-07T06:57:15.4177491Z >>> def get_vjp(v): 2025-09-07T06:57:15.4177727Z >>> return torch.autograd.grad(y, x, v) 2025-09-07T06:57:15.4178014Z >>> jacobian = torch.vmap(get_vjp)(I_N) 2025-09-07T06:57:15.4178199Z 2025-09-07T06:57:15.4178422Z :func:`vmap` can also be nested, producing an output with multiple batched dimensions 2025-09-07T06:57:15.4178732Z 2025-09-07T06:57:15.4178825Z >>> torch.dot # [D], [D] -> [] 2025-09-07T06:57:15.4179087Z >>> batched_dot = torch.vmap( 2025-09-07T06:57:15.4179333Z ... torch.vmap(torch.dot) 2025-09-07T06:57:15.4179595Z ... ) # [N1, N0, D], [N1, N0, D] -> [N1, N0] 2025-09-07T06:57:15.4179904Z >>> x, y = torch.randn(2, 3, 5), torch.randn(2, 3, 5) 2025-09-07T06:57:15.4180214Z >>> batched_dot(x, y) # tensor of size [2, 3] 2025-09-07T06:57:15.4180410Z 2025-09-07T06:57:15.4180603Z If the inputs are not batched along the first dimension, ``in_dims`` specifies 2025-09-07T06:57:15.4181016Z the dimension that each inputs are batched along as 2025-09-07T06:57:15.4181223Z 2025-09-07T06:57:15.4181321Z >>> torch.dot # [N], [N] -> [] 2025-09-07T06:57:15.4181662Z >>> batched_dot = torch.vmap(torch.dot, in_dims=1) # [N, D], [N, D] -> [D] 2025-09-07T06:57:15.4182030Z >>> x, y = torch.randn(2, 5), torch.randn(2, 5) 2025-09-07T06:57:15.4182292Z >>> batched_dot( 2025-09-07T06:57:15.4182492Z ... x, y 2025-09-07T06:57:15.4182758Z ... ) # output is [5] instead of [2] if batched along the 0th dimension 2025-09-07T06:57:15.4183010Z 2025-09-07T06:57:15.4183312Z If there are multiple inputs each of which is batched along different dimensions, 2025-09-07T06:57:15.4183781Z ``in_dims`` must be a tuple with the batch dimension for each input as 2025-09-07T06:57:15.4184041Z 2025-09-07T06:57:15.4184126Z >>> torch.dot # [D], [D] -> [] 2025-09-07T06:57:15.4184482Z >>> batched_dot = torch.vmap(torch.dot, in_dims=(0, None)) # [N, D], [D] -> [N] 2025-09-07T06:57:15.4184869Z >>> x, y = torch.randn(2, 5), torch.randn(5) 2025-09-07T06:57:15.4185127Z >>> batched_dot( 2025-09-07T06:57:15.4185324Z ... x, y 2025-09-07T06:57:15.4185607Z ... ) # second arg doesn't have a batch dim because in_dim[1] was None 2025-09-07T06:57:15.4185861Z 2025-09-07T06:57:15.4186072Z If the input is a Python struct, ``in_dims`` must be a tuple containing a struct 2025-09-07T06:57:15.4186451Z matching the shape of the input: 2025-09-07T06:57:15.4186616Z 2025-09-07T06:57:15.4186744Z >>> f = lambda dict: torch.dot(dict["x"], dict["y"]) 2025-09-07T06:57:15.4187052Z >>> x, y = torch.randn(2, 5), torch.randn(5) 2025-09-07T06:57:15.4187330Z >>> input = {"x": x, "y": y} 2025-09-07T06:57:15.4187633Z >>> batched_dot = torch.vmap(f, in_dims=({"x": 0, "y": None},)) 2025-09-07T06:57:15.4187946Z >>> batched_dot(input) 2025-09-07T06:57:15.4188095Z 2025-09-07T06:57:15.4188318Z By default, the output is batched along the first dimension. However, it can be batched 2025-09-07T06:57:15.4188737Z along any dimension by using ``out_dims`` 2025-09-07T06:57:15.4188926Z 2025-09-07T06:57:15.4189004Z >>> f = lambda x: x**2 2025-09-07T06:57:15.4189225Z >>> x = torch.randn(2, 5) 2025-09-07T06:57:15.4189577Z >>> batched_pow = torch.vmap(f, out_dims=1) 2025-09-07T06:57:15.4189851Z >>> batched_pow(x) # [5, 2] 2025-09-07T06:57:15.4190014Z 2025-09-07T06:57:15.4190248Z For any function that uses kwargs, the returned function will not batch the kwargs but will 2025-09-07T06:57:15.4190664Z accept kwargs 2025-09-07T06:57:15.4190777Z 2025-09-07T06:57:15.4190870Z >>> x = torch.randn([2, 5]) 2025-09-07T06:57:15.4191103Z >>> def fn(x, scale=4.): 2025-09-07T06:57:15.4191396Z >>> return x * scale 2025-09-07T06:57:15.4191679Z >>> 2025-09-07T06:57:15.4191866Z >>> batched_pow = torch.vmap(fn) 2025-09-07T06:57:15.4192156Z >>> assert torch.allclose(batched_pow(x), x * 4) 2025-09-07T06:57:15.4192542Z >>> batched_pow(x, scale=x) # scale is not batched, output has shape [2, 2, 5] 2025-09-07T06:57:15.4192829Z 2025-09-07T06:57:15.4192904Z .. note:: 2025-09-07T06:57:15.4193187Z vmap does not provide general autobatching or handle variable-length 2025-09-07T06:57:15.4193553Z sequences out of the box. 2025-09-07T06:57:15.4193711Z 2025-09-07T06:57:15.4194123Z Original Error: IndentationError('expected an indented block after function definition on line 4', ('', 5, 1, '_._ = None\n', 5, 2)) 2025-09-07T06:57:15.4194620Z 2025-09-07T06:57:15.4194694Z _._ = None 2025-09-07T06:57:15.4194864Z ^ 2025-09-07T06:57:15.4195384Z msg = Cannot scrape callname=grad in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/apis.py line=306. 2025-09-07T06:57:15.4196044Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:15.4196524Z ``grad`` operator helps computing gradients of ``func`` with respect to the 2025-09-07T06:57:15.4196965Z input(s) specified by ``argnums``. This operator can be nested to 2025-09-07T06:57:15.4197317Z compute higher-order gradients. 2025-09-07T06:57:15.4197489Z 2025-09-07T06:57:15.4197569Z Args: 2025-09-07T06:57:15.4197845Z func (Callable): A Python function that takes one or more arguments. 2025-09-07T06:57:15.4198323Z Must return a single-element Tensor. If specified ``has_aux`` equals ``True``, 2025-09-07T06:57:15.4198848Z function can return a tuple of single-element Tensor and other auxiliary objects: 2025-09-07T06:57:15.4199238Z ``(output, aux)``. 2025-09-07T06:57:15.4199685Z argnums (int or Tuple[int]): Specifies arguments to compute gradients with respect to. 2025-09-07T06:57:15.4200175Z ``argnums`` can be single integer or tuple of integers. Default: 0. 2025-09-07T06:57:15.4200621Z has_aux (bool): Flag indicating that ``func`` returns a tensor and other 2025-09-07T06:57:15.4201032Z auxiliary objects: ``(output, aux)``. Default: False. 2025-09-07T06:57:15.4201262Z 2025-09-07T06:57:15.4201337Z Returns: 2025-09-07T06:57:15.4201676Z Function to compute gradients with respect to its inputs. By default, the output of 2025-09-07T06:57:15.4202192Z the function is the gradient tensor(s) with respect to the first argument. 2025-09-07T06:57:15.4202688Z If specified ``has_aux`` equals ``True``, tuple of gradients and output auxiliary objects 2025-09-07T06:57:15.4203206Z is returned. If ``argnums`` is a tuple of integers, a tuple of output gradients with 2025-09-07T06:57:15.4203618Z respect to each ``argnums`` value is returned. 2025-09-07T06:57:15.4203820Z 2025-09-07T06:57:15.4203926Z Example of using ``grad``: 2025-09-07T06:57:15.4204082Z 2025-09-07T06:57:15.4204177Z >>> # xdoctest: +SKIP 2025-09-07T06:57:15.4204420Z >>> from torch.func import grad 2025-09-07T06:57:15.4204688Z >>> x = torch.randn([]) 2025-09-07T06:57:15.4204945Z >>> cos_x = grad(lambda x: torch.sin(x))(x) 2025-09-07T06:57:15.4205246Z >>> assert torch.allclose(cos_x, x.cos()) 2025-09-07T06:57:15.4205499Z >>> 2025-09-07T06:57:15.4205693Z >>> # Second-order gradients 2025-09-07T06:57:15.4205987Z >>> neg_sin_x = grad(grad(lambda x: torch.sin(x)))(x) 2025-09-07T06:57:15.4206395Z >>> assert torch.allclose(neg_sin_x, -x.sin()) 2025-09-07T06:57:15.4206598Z 2025-09-07T06:57:15.4206800Z When composed with ``vmap``, ``grad`` can be used to compute per-sample-gradients: 2025-09-07T06:57:15.4207096Z 2025-09-07T06:57:15.4207181Z >>> # xdoctest: +SKIP 2025-09-07T06:57:15.4207437Z >>> from torch.func import grad, vmap 2025-09-07T06:57:15.4207722Z >>> batch_size, feature_size = 3, 5 2025-09-07T06:57:15.4208104Z >>> 2025-09-07T06:57:15.4208306Z >>> def model(weights, feature_vec): 2025-09-07T06:57:15.4208589Z >>> # Very simple linear model with activation 2025-09-07T06:57:15.4208873Z >>> assert feature_vec.dim() == 1 2025-09-07T06:57:15.4209152Z >>> return feature_vec.dot(weights).relu() 2025-09-07T06:57:15.4209404Z >>> 2025-09-07T06:57:15.4209615Z >>> def compute_loss(weights, example, target): 2025-09-07T06:57:15.4209902Z >>> y = model(weights, example) 2025-09-07T06:57:15.4210193Z >>> return ((y - target) ** 2).mean() # MSELoss 2025-09-07T06:57:15.4210463Z >>> 2025-09-07T06:57:15.4210705Z >>> weights = torch.randn(feature_size, requires_grad=True) 2025-09-07T06:57:15.4211057Z >>> examples = torch.randn(batch_size, feature_size) 2025-09-07T06:57:15.4211363Z >>> targets = torch.randn(batch_size) 2025-09-07T06:57:15.4211643Z >>> inputs = (weights, examples, targets) 2025-09-07T06:57:15.4212016Z >>> grad_weight_per_example = vmap(grad(compute_loss), in_dims=(None, 0, 0))( 2025-09-07T06:57:15.4212374Z ... *inputs 2025-09-07T06:57:15.4212563Z ... ) 2025-09-07T06:57:15.4212673Z 2025-09-07T06:57:15.4212815Z Example of using ``grad`` with ``has_aux`` and ``argnums``: 2025-09-07T06:57:15.4213047Z 2025-09-07T06:57:15.4213126Z >>> # xdoctest: +SKIP 2025-09-07T06:57:15.4213363Z >>> from torch.func import grad 2025-09-07T06:57:15.4213622Z >>> def my_loss_func(y, y_pred): 2025-09-07T06:57:15.4213974Z >>> loss_per_sample = (0.5 * y_pred - y) ** 2 2025-09-07T06:57:15.4214268Z >>> loss = loss_per_sample.mean() 2025-09-07T06:57:15.4214553Z >>> return loss, (y_pred, loss_per_sample) 2025-09-07T06:57:15.4214808Z >>> 2025-09-07T06:57:15.4215030Z >>> fn = grad(my_loss_func, argnums=(0, 1), has_aux=True) 2025-09-07T06:57:15.4215535Z >>> y_true = torch.rand(4) 2025-09-07T06:57:15.4215810Z >>> y_preds = torch.rand(4, requires_grad=True) 2025-09-07T06:57:15.4216094Z >>> out = fn(y_true, y_preds) 2025-09-07T06:57:15.4216456Z >>> # > output is ((grads w.r.t y_true, grads w.r.t y_preds), (y_pred, loss_per_sample)) 2025-09-07T06:57:15.4216746Z 2025-09-07T06:57:15.4216822Z .. note:: 2025-09-07T06:57:15.4217068Z Using PyTorch ``torch.no_grad`` together with ``grad``. 2025-09-07T06:57:15.4217300Z 2025-09-07T06:57:15.4217423Z Case 1: Using ``torch.no_grad`` inside a function: 2025-09-07T06:57:15.4217639Z 2025-09-07T06:57:15.4217729Z >>> # xdoctest: +SKIP 2025-09-07T06:57:15.4217962Z >>> def f(x): 2025-09-07T06:57:15.4218177Z >>> with torch.no_grad(): 2025-09-07T06:57:15.4218427Z >>> c = x ** 2 2025-09-07T06:57:15.4218663Z >>> return x - c 2025-09-07T06:57:15.4218810Z 2025-09-07T06:57:15.4218980Z In this case, ``grad(f)(x)`` will respect the inner ``torch.no_grad``. 2025-09-07T06:57:15.4219230Z 2025-09-07T06:57:15.4219381Z Case 2: Using ``grad`` inside ``torch.no_grad`` context manager: 2025-09-07T06:57:15.4219616Z 2025-09-07T06:57:15.4219701Z >>> # xdoctest: +SKIP 2025-09-07T06:57:15.4219938Z >>> with torch.no_grad(): 2025-09-07T06:57:15.4220178Z >>> grad(f)(x) 2025-09-07T06:57:15.4220320Z 2025-09-07T06:57:15.4220504Z In this case, ``grad`` will respect the inner ``torch.no_grad``, but not the 2025-09-07T06:57:15.4220946Z outer one. This is because ``grad`` is a "function transform": its result 2025-09-07T06:57:15.4221482Z should not depend on the result of a context manager outside of ``f``. 2025-09-07T06:57:15.4221748Z 2025-09-07T06:57:15.4221820Z 2025-09-07T06:57:15.4222327Z Original Error: IndentationError('expected an indented block after function definition on line 5', ('', 6, 1, '_._ = None\n', 6, 2)) 2025-09-07T06:57:15.4222824Z 2025-09-07T06:57:15.4222968Z _._ = None 2025-09-07T06:57:15.4223210Z ^ 2025-09-07T06:57:15.7116059Z msg = Cannot scrape callname=ReduceLROnPlateau in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py line=1236. 2025-09-07T06:57:15.7117385Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:15.7118133Z Reduce learning rate when a metric has stopped improving. 2025-09-07T06:57:15.7118534Z 2025-09-07T06:57:15.7118818Z Models often benefit from reducing the learning rate by a factor 2025-09-07T06:57:15.7119518Z of 2-10 once learning stagnates. This scheduler reads a metrics 2025-09-07T06:57:15.7120190Z quantity and if no improvement is seen for a 'patience' number 2025-09-07T06:57:15.7120784Z of epochs, the learning rate is reduced. 2025-09-07T06:57:15.7121108Z 2025-09-07T06:57:15.7121234Z Args: 2025-09-07T06:57:15.7121601Z optimizer (Optimizer): Wrapped optimizer. 2025-09-07T06:57:15.7122162Z mode (str): One of `min`, `max`. In `min` mode, lr will 2025-09-07T06:57:15.7122753Z be reduced when the quantity monitored has stopped 2025-09-07T06:57:15.7123358Z decreasing; in `max` mode it will be reduced when the 2025-09-07T06:57:15.7124005Z quantity monitored has stopped increasing. Default: 'min'. 2025-09-07T06:57:15.7124656Z factor (float): Factor by which the learning rate will be 2025-09-07T06:57:15.7125240Z reduced. new_lr = lr * factor. Default: 0.1. 2025-09-07T06:57:15.7125972Z patience (int): The number of allowed epochs with no improvement after 2025-09-07T06:57:15.7126748Z which the learning rate will be reduced. 2025-09-07T06:57:15.7127523Z For example, consider the case of having no patience (`patience = 0`). 2025-09-07T06:57:15.7128676Z In the first epoch, a baseline is established and is always considered good as there's no previous baseline. 2025-09-07T06:57:15.7129880Z In the second epoch, if the performance is worse than the baseline, 2025-09-07T06:57:15.7130341Z we have what is considered an intolerable epoch. 2025-09-07T06:57:15.7130776Z Since the count of intolerable epochs (1) is greater than the patience level (0), 2025-09-07T06:57:15.7131227Z the learning rate is reduced at the end of this epoch. 2025-09-07T06:57:15.7131720Z From the third epoch onwards, the learning rate continues to be reduced at the end of each epoch 2025-09-07T06:57:15.7132322Z if the performance is worse than the baseline. If the performance improves or remains the same, 2025-09-07T06:57:15.7132768Z the learning rate is not adjusted. 2025-09-07T06:57:15.7133036Z Default: 10. 2025-09-07T06:57:15.7133331Z threshold (float): Threshold for measuring the new optimum, 2025-09-07T06:57:15.7133699Z to only focus on significant changes. Default: 1e-4. 2025-09-07T06:57:15.7134168Z threshold_mode (str): One of `rel`, `abs`. In `rel` mode, 2025-09-07T06:57:15.7134532Z dynamic_threshold = best * ( 1 + threshold ) in 'max' 2025-09-07T06:57:15.7134870Z mode or best * ( 1 - threshold ) in `min` mode. 2025-09-07T06:57:15.7135210Z In `abs` mode, dynamic_threshold = best + threshold in 2025-09-07T06:57:15.7135579Z `max` mode or best - threshold in `min` mode. Default: 'rel'. 2025-09-07T06:57:15.7135957Z cooldown (int): Number of epochs to wait before resuming 2025-09-07T06:57:15.7136328Z normal operation after lr has been reduced. Default: 0. 2025-09-07T06:57:15.7136793Z min_lr (float or list): A scalar or a list of scalars. A 2025-09-07T06:57:15.7137138Z lower bound on the learning rate of all param groups 2025-09-07T06:57:15.7137467Z or each group respectively. Default: 0. 2025-09-07T06:57:15.7137809Z eps (float): Minimal decay applied to lr. If the difference 2025-09-07T06:57:15.7138285Z between new and old lr is smaller than eps, the update is 2025-09-07T06:57:15.7138688Z ignored. Default: 1e-8. 2025-09-07T06:57:15.7138847Z 2025-09-07T06:57:15.7138928Z Example: 2025-09-07T06:57:15.7139118Z >>> # xdoctest: +SKIP 2025-09-07T06:57:15.7139458Z >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9) 2025-09-07T06:57:15.7139868Z >>> scheduler = ReduceLROnPlateau(optimizer, "min") 2025-09-07T06:57:15.7140171Z >>> for epoch in range(10): 2025-09-07T06:57:15.7140409Z >>> train(...) 2025-09-07T06:57:15.7140637Z >>> val_loss = validate(...) 2025-09-07T06:57:15.7140949Z >>> # Note that step should be called after validate() 2025-09-07T06:57:15.7141269Z >>> scheduler.step(val_loss) 2025-09-07T06:57:15.7141450Z 2025-09-07T06:57:15.7141630Z .. image:: ../scripts/lr_scheduler_images/ReduceLROnPlateau.png 2025-09-07T06:57:15.7141958Z 2025-09-07T06:57:15.7142403Z Original Error: IndentationError('unexpected indent', ('', 8, 4, ' scheduler.step(val_loss)\n', 8, -1)) 2025-09-07T06:57:15.7142821Z 2025-09-07T06:57:15.7142918Z scheduler.step(val_loss) 2025-09-07T06:57:15.7143140Z ^ 2025-09-07T06:57:15.7928166Z gathering tests 2025-09-07T06:57:15.7945389Z running 863 test(s) 2025-09-07T06:57:15.7952922Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::typename:0, line 1082 <- wrt source file 2025-09-07T06:57:15.7961723Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::typename:0 2025-09-07T06:57:15.7962707Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::is_tensor:0, line 1118 <- wrt source file 2025-09-07T06:57:15.7965951Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::is_tensor:0 2025-09-07T06:57:15.7967046Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::set_default_device:0, line 1203 <- wrt source file 2025-09-07T06:57:15.7968876Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::set_default_device:0 2025-09-07T06:57:15.7969870Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::set_default_tensor_type:0, line 1252 <- wrt source file 2025-09-07T06:57:15.7970867Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::set_default_tensor_type:0 2025-09-07T06:57:15.7971811Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::set_default_dtype:0, line 1289 <- wrt source file 2025-09-07T06:57:15.7974462Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::set_default_dtype:0 2025-09-07T06:57:15.7975437Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::use_deterministic_algorithms:0, line 1444 <- wrt source file 2025-09-07T06:57:15.7976475Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::use_deterministic_algorithms:0 2025-09-07T06:57:15.7977405Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::compile:0, line 2568 <- wrt source file 2025-09-07T06:57:15.7978275Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::compile:0 2025-09-07T06:57:15.7979231Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::_is_device_backend_autoload_enabled:0, line 2841 <- wrt source file 2025-09-07T06:57:15.7980451Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py::_is_device_backend_autoload_enabled:0 2025-09-07T06:57:15.7981513Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::Library.define:0, line 153 <- wrt source file 2025-09-07T06:57:15.7985318Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::Library.define:0 2025-09-07T06:57:15.7986335Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::Library._impl_with_aoti_compile:0, line 247 <- wrt source file 2025-09-07T06:57:15.7996726Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::Library._impl_with_aoti_compile:0 2025-09-07T06:57:15.7997696Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::Library.impl:0, line 307 <- wrt source file 2025-09-07T06:57:15.8001709Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::Library.impl:0 2025-09-07T06:57:15.8002593Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::define:0, line 512 <- wrt source file 2025-09-07T06:57:15.9350261Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::define:0 2025-09-07T06:57:15.9351911Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::impl:0, line 618 <- wrt source file 2025-09-07T06:57:15.9367106Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::impl:0 2025-09-07T06:57:15.9368013Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::register_kernel:0, line 799 <- wrt source file 2025-09-07T06:57:15.9369255Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::register_kernel:0 2025-09-07T06:57:15.9370320Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::register_autocast:0, line 867 <- wrt source file 2025-09-07T06:57:15.9372453Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::register_autocast:0 2025-09-07T06:57:15.9374251Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::register_autograd:0, line 1116 <- wrt source file 2025-09-07T06:57:16.0204778Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::register_autograd:0 2025-09-07T06:57:16.0206473Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::register_torch_dispatch:0, line 1232 <- wrt source file 2025-09-07T06:57:16.0287143Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::register_torch_dispatch:0 2025-09-07T06:57:16.0288784Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::register_vmap:0, line 1321 <- wrt source file 2025-09-07T06:57:16.0455631Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::register_vmap:0 2025-09-07T06:57:16.0457134Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::opcheck:0, line 1646 <- wrt source file 2025-09-07T06:57:16.0458622Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py::opcheck:0 2025-09-07T06:57:16.0460119Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_custom_ops.py::custom_op:0, line 55 <- wrt source file 2025-09-07T06:57:16.0461665Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_custom_ops.py::custom_op:0 2025-09-07T06:57:16.0463169Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_custom_ops.py::impl:0, line 138 <- wrt source file 2025-09-07T06:57:16.0464058Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_custom_ops.py::impl:0 2025-09-07T06:57:16.0465061Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_custom_ops.py::impl_abstract:0, line 208 <- wrt source file 2025-09-07T06:57:16.0567224Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_custom_ops.py::impl_abstract:0 2025-09-07T06:57:16.0568853Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/quasirandom.py::SobolEngine:0, line 39 <- wrt source file 2025-09-07T06:57:16.0570449Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/quasirandom.py::SobolEngine:0 2025-09-07T06:57:16.0572132Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::Generator:0, line 15 <- wrt source file 2025-09-07T06:57:16.0573573Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::Generator:0 2025-09-07T06:57:16.0574701Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::_LinAlgError:0, line 5 <- wrt source file 2025-09-07T06:57:16.0575770Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::_LinAlgError:0 2025-09-07T06:57:16.0576751Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.register_hook:0, line 649 <- wrt source file 2025-09-07T06:57:16.0577697Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.register_hook:0 2025-09-07T06:57:16.0578711Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.register_post_accumulate_grad_hook:0, line 706 <- wrt source file 2025-09-07T06:57:16.0596337Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.register_post_accumulate_grad_hook:0 2025-09-07T06:57:16.0597473Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.refine_names:0, line 1333 <- wrt source file 2025-09-07T06:57:16.0704036Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.refine_names:0 2025-09-07T06:57:16.0709740Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.align_to:0, line 1378 <- wrt source file 2025-09-07T06:57:16.0714253Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.align_to:0 2025-09-07T06:57:16.0715230Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.rename:0, line 1451 <- wrt source file 2025-09-07T06:57:16.0722421Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.rename:0 2025-09-07T06:57:16.0723408Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.to_sparse_coo:0, line 1481 <- wrt source file 2025-09-07T06:57:16.0728745Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.to_sparse_coo:0 2025-09-07T06:57:16.0729743Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.dim_order:0, line 1513 <- wrt source file 2025-09-07T06:57:16.0749682Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py::Tensor.dim_order:0 2025-09-07T06:57:16.0750618Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/torch_version.py::TorchVersion:0, line 19 <- wrt source file 2025-09-07T06:57:16.0751696Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/torch_version.py::TorchVersion:0 2025-09-07T06:57:16.0752562Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::list:0, line 473 <- wrt source file 2025-09-07T06:57:16.0753955Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::list:0 2025-09-07T06:57:16.0755421Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::help:0, line 533 <- wrt source file 2025-09-07T06:57:16.0756801Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::help:0 2025-09-07T06:57:16.0758139Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::load:0, line 624 <- wrt source file 2025-09-07T06:57:16.0759482Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::load:0 2025-09-07T06:57:16.0760844Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::_load_local:0, line 672 <- wrt source file 2025-09-07T06:57:16.0762297Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::_load_local:0 2025-09-07T06:57:16.0763525Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::download_url_to_file:0, line 707 <- wrt source file 2025-09-07T06:57:16.0764429Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::download_url_to_file:0 2025-09-07T06:57:16.0765323Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::load_state_dict_from_url:0, line 847 <- wrt source file 2025-09-07T06:57:16.0766244Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py::load_state_dict_from_url:0 2025-09-07T06:57:16.0767169Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor_str.py::set_printoptions:0, line 53 <- wrt source file 2025-09-07T06:57:16.0772703Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor_str.py::set_printoptions:0 2025-09-07T06:57:16.0773735Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::broadcast_tensors:0, line 64 <- wrt source file 2025-09-07T06:57:16.0780912Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::broadcast_tensors:0 2025-09-07T06:57:16.0782211Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::broadcast_shapes:0, line 92 <- wrt source file 2025-09-07T06:57:16.0784390Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::broadcast_shapes:0 2025-09-07T06:57:16.0785313Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::split:0, line 144 <- wrt source file 2025-09-07T06:57:16.0798389Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::split:0 2025-09-07T06:57:16.0799311Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::einsum:0, line 258 <- wrt source file 2025-09-07T06:57:16.0817547Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::einsum:0 2025-09-07T06:57:16.0818437Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::meshgrid:0, line 450 <- wrt source file 2025-09-07T06:57:16.0865577Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::meshgrid:0 2025-09-07T06:57:16.0867101Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::_unique_impl:0, line 835 <- wrt source file 2025-09-07T06:57:16.0915186Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::_unique_impl:0 2025-09-07T06:57:16.0916860Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::_unique_consecutive_impl:0, line 992 <- wrt source file 2025-09-07T06:57:16.0928122Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::_unique_consecutive_impl:0 2025-09-07T06:57:16.0929205Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::tensordot:0, line 1267 <- wrt source file 2025-09-07T06:57:16.0940252Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::tensordot:0 2025-09-07T06:57:16.0941180Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::cartesian_prod:0, line 1351 <- wrt source file 2025-09-07T06:57:16.0948541Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::cartesian_prod:0 2025-09-07T06:57:16.0949466Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::block_diag:0, line 1385 <- wrt source file 2025-09-07T06:57:16.0959721Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::block_diag:0 2025-09-07T06:57:16.0960601Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::cdist:0, line 1441 <- wrt source file 2025-09-07T06:57:16.0975448Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::cdist:0 2025-09-07T06:57:16.0976315Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::atleast_1d:0, line 1482 <- wrt source file 2025-09-07T06:57:16.0994747Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::atleast_1d:0 2025-09-07T06:57:16.0996258Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::atleast_2d:0, line 1520 <- wrt source file 2025-09-07T06:57:16.1014600Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::atleast_2d:0 2025-09-07T06:57:16.1016143Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::atleast_3d:0, line 1560 <- wrt source file 2025-09-07T06:57:16.1039159Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::atleast_3d:0 2025-09-07T06:57:16.1040639Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::norm:0, line 1735 <- wrt source file 2025-09-07T06:57:16.1076133Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::norm:0 2025-09-07T06:57:16.1077659Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::unravel_index:0, line 1903 <- wrt source file 2025-09-07T06:57:16.1107573Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::unravel_index:0 2025-09-07T06:57:16.1108701Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::chain_matmul:0, line 2003 <- wrt source file 2025-09-07T06:57:16.1109839Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::chain_matmul:0 2025-09-07T06:57:16.1110943Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::_lu_impl:0, line 2104 <- wrt source file 2025-09-07T06:57:16.1112043Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/functional.py::_lu_impl:0 2025-09-07T06:57:16.1113174Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::add_safe_globals:0, line 299 <- wrt source file 2025-09-07T06:57:16.1114359Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::add_safe_globals:0 2025-09-07T06:57:16.1115295Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::safe_globals:0, line 324 <- wrt source file 2025-09-07T06:57:16.1116357Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::safe_globals:0 2025-09-07T06:57:16.1128140Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::skip_data:0, line 400 <- wrt source file 2025-09-07T06:57:16.1129205Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::skip_data:0 2025-09-07T06:57:16.1130218Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::register_package:0, line 472 <- wrt source file 2025-09-07T06:57:16.1131255Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::register_package:0 2025-09-07T06:57:16.1132191Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::save:0, line 950 <- wrt source file 2025-09-07T06:57:16.1133103Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::save:0 2025-09-07T06:57:16.1134081Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::load:0, line 1363 <- wrt source file 2025-09-07T06:57:16.1134994Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/serialization.py::load:0 2025-09-07T06:57:16.1135930Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::get_ignored_functions:0, line 116 <- wrt source file 2025-09-07T06:57:16.1136933Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::get_ignored_functions:0 2025-09-07T06:57:16.1137894Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::get_testing_overrides:0, line 423 <- wrt source file 2025-09-07T06:57:16.1170674Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::get_testing_overrides:0 2025-09-07T06:57:16.1172472Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::wrap_torch_function:0, line 1578 <- wrt source file 2025-09-07T06:57:16.1174277Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::wrap_torch_function:0 2025-09-07T06:57:16.1175262Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::handle_torch_function:0, line 1713 <- wrt source file 2025-09-07T06:57:16.1177342Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::handle_torch_function:0 2025-09-07T06:57:16.1178377Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::is_tensor_method_or_property:0, line 1961 <- wrt source file 2025-09-07T06:57:16.1209358Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::is_tensor_method_or_property:0 2025-09-07T06:57:16.1211042Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::is_tensor_like:0, line 1980 <- wrt source file 2025-09-07T06:57:16.1217043Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/overrides.py::is_tensor_like:0 2025-09-07T06:57:16.1218013Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_namedtensor_internals.py::update_names:0, line 118 <- wrt source file 2025-09-07T06:57:16.1219218Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_namedtensor_internals.py::update_names:0 2025-09-07T06:57:16.1220198Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py::list_mode_options:0, line 320 <- wrt source file 2025-09-07T06:57:16.1222236Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py::list_mode_options:0 2025-09-07T06:57:16.1223290Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py::list_options:0, line 357 <- wrt source file 2025-09-07T06:57:16.1236609Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py::list_options:0 2025-09-07T06:57:16.1237650Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/monitor/__init__.py::TensorboardEventHandler:0, line 22 <- wrt source file 2025-09-07T06:57:16.1258632Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/monitor/__init__.py::TensorboardEventHandler:0 2025-09-07T06:57:16.1260424Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::allow_in_graph:0, line 127 <- wrt source file 2025-09-07T06:57:16.1262122Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::allow_in_graph:0 2025-09-07T06:57:16.1263791Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::substitute_in_graph:0, line 183 <- wrt source file 2025-09-07T06:57:16.5033110Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::substitute_in_graph:0 2025-09-07T06:57:16.5034950Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::wrap_numpy:0, line 413 <- wrt source file 2025-09-07T06:57:16.5036684Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::wrap_numpy:0 2025-09-07T06:57:16.5038328Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::is_compiling:0, line 445 <- wrt source file 2025-09-07T06:57:16.5040243Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::is_compiling:0 2025-09-07T06:57:16.5041804Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::is_dynamo_compiling:0, line 466 <- wrt source file 2025-09-07T06:57:16.5042874Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::is_dynamo_compiling:0 2025-09-07T06:57:16.5043872Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::is_exporting:0, line 484 <- wrt source file 2025-09-07T06:57:16.5045439Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::is_exporting:0 2025-09-07T06:57:16.5046466Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::save_cache_artifacts:0, line 499 <- wrt source file 2025-09-07T06:57:16.5047530Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::save_cache_artifacts:0 2025-09-07T06:57:16.5048574Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::load_cache_artifacts:0, line 514 <- wrt source file 2025-09-07T06:57:16.5049627Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/compiler/__init__.py::load_cache_artifacts:0 2025-09-07T06:57:16.5050602Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/jit/__init__.py::annotate:0, line 147 <- wrt source file 2025-09-07T06:57:16.5051607Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/jit/__init__.py::annotate:0 2025-09-07T06:57:16.5052479Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py::save:0, line 349 <- wrt source file 2025-09-07T06:57:16.5053375Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py::save:0 2025-09-07T06:57:16.5054452Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py::load:0, line 419 <- wrt source file 2025-09-07T06:57:16.5055477Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py::load:0 2025-09-07T06:57:16.5056413Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py::register_dataclass:0, line 576 <- wrt source file 2025-09-07T06:57:16.5057438Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py::register_dataclass:0 2025-09-07T06:57:16.5058377Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py::sum:0, line 223 <- wrt source file 2025-09-07T06:57:16.5065210Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py::sum:0 2025-09-07T06:57:16.5066201Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py::check_sparse_tensor_invariants:0, line 475 <- wrt source file 2025-09-07T06:57:16.5077079Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py::check_sparse_tensor_invariants:0 2025-09-07T06:57:16.5078110Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py::as_sparse_gradcheck:0, line 561 <- wrt source file 2025-09-07T06:57:16.5182469Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py::as_sparse_gradcheck:0 2025-09-07T06:57:16.5183765Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/accelerator/__init__.py::current_accelerator:0, line 113 <- wrt source file 2025-09-07T06:57:16.8320645Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/accelerator/__init__.py::current_accelerator:0 2025-09-07T06:57:16.8321741Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/accelerator/__init__.py::device_index:0, line 249 <- wrt source file 2025-09-07T06:57:16.8322809Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/accelerator/__init__.py::device_index:0 2025-09-07T06:57:16.8323872Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_prims_common/__init__.py::compute_required_storage_length:0, line 1884 <- wrt source file 2025-09-07T06:57:16.8330382Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_prims_common/__init__.py::compute_required_storage_length:0 2025-09-07T06:57:16.8331499Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/mps/__init__.py::compile_shader:0, line 148 <- wrt source file 2025-09-07T06:57:16.8332484Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/mps/__init__.py::compile_shader:0 2025-09-07T06:57:16.8333438Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/futures/__init__.py::Future.then:0, line 148 <- wrt source file 2025-09-07T06:57:16.8334538Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/futures/__init__.py::Future.then:0 2025-09-07T06:57:16.8335552Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/futures/__init__.py::Future.add_done_callback:0, line 197 <- wrt source file 2025-09-07T06:57:16.8336631Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/futures/__init__.py::Future.add_done_callback:0 2025-09-07T06:57:16.8337788Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_result:0, line 231 <- wrt source file 2025-09-07T06:57:16.8338794Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_result:0 2025-09-07T06:57:16.8339868Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_exception:0, line 261 <- wrt source file 2025-09-07T06:57:16.8340978Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_exception:0 2025-09-07T06:57:16.8341941Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/futures/__init__.py::collect_all:0, line 295 <- wrt source file 2025-09-07T06:57:16.8342887Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/futures/__init__.py::collect_all:0 2025-09-07T06:57:16.8343810Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nested/__init__.py::as_nested_tensor:0, line 61 <- wrt source file 2025-09-07T06:57:16.8365090Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nested/__init__.py::as_nested_tensor:0 2025-09-07T06:57:16.8366062Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nested/__init__.py::nested_tensor:0, line 240 <- wrt source file 2025-09-07T06:57:16.8371697Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nested/__init__.py::nested_tensor:0 2025-09-07T06:57:16.8373810Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nested/__init__.py::narrow:0, line 315 <- wrt source file 2025-09-07T06:57:16.8430137Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nested/__init__.py::narrow:0 2025-09-07T06:57:16.8431599Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nested/__init__.py::nested_tensor_from_jagged:0, line 405 <- wrt source file 2025-09-07T06:57:16.8438232Z W0907 06:57:16.843000 9916 site-packages/torch/fx/_symbolic_trace.py:52] is_fx_tracing will return true for both fx.symbolic_trace and torch.export. Please use is_fx_tracing_symbolic_tracing() for specifically fx.symbolic_trace or torch.compiler.is_compiling() for specifically torch.export/compile. 2025-09-07T06:57:16.8456108Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nested/__init__.py::nested_tensor_from_jagged:0 2025-09-07T06:57:16.8457285Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nested/__init__.py::masked_select:0, line 481 <- wrt source file 2025-09-07T06:57:16.8476456Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nested/__init__.py::masked_select:0 2025-09-07T06:57:16.8477575Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py::_compile_kernel:0, line 1760 <- wrt source file 2025-09-07T06:57:16.8479190Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py::_compile_kernel:0 2025-09-07T06:57:16.8480963Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py::add_preprocessing_fn:0, line 3473 <- wrt source file 2025-09-07T06:57:16.8482871Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py::add_preprocessing_fn:0 2025-09-07T06:57:16.8484655Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py::WritableTempFile:0, line 374 <- wrt source file 2025-09-07T06:57:16.8486421Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py::WritableTempFile:0 2025-09-07T06:57:16.8488267Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cpp_builder.py::get_name_and_dir_from_output_file_path:0, line 1721 <- wrt source file 2025-09-07T06:57:16.8489543Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cpp_builder.py::get_name_and_dir_from_output_file_path:0 2025-09-07T06:57:16.8490825Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/template_heuristics/registry.py::register_template_heuristic:0, line 54 <- wrt source file 2025-09-07T06:57:16.8492097Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/template_heuristics/registry.py::register_template_heuristic:0 2025-09-07T06:57:16.8493229Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py::checkpoint_sequential:0, line 555 <- wrt source file 2025-09-07T06:57:16.8494374Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py::checkpoint_sequential:0 2025-09-07T06:57:16.8495432Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py::set_checkpoint_early_stop:0, line 757 <- wrt source file 2025-09-07T06:57:16.8496509Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py::set_checkpoint_early_stop:0 2025-09-07T06:57:16.8497605Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py::SelectiveCheckpointContext:0, line 1234 <- wrt source file 2025-09-07T06:57:16.8498731Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py::SelectiveCheckpointContext:0 2025-09-07T06:57:16.8499851Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py::create_selective_checkpoint_contexts:0, line 1390 <- wrt source file 2025-09-07T06:57:16.8501036Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py::create_selective_checkpoint_contexts:0 2025-09-07T06:57:16.8502142Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/dlpack.py::from_dlpack:0, line 93 <- wrt source file 2025-09-07T06:57:16.8516091Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/dlpack.py::from_dlpack:0 2025-09-07T06:57:16.8517103Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::register_pytree_node:0, line 156 <- wrt source file 2025-09-07T06:57:16.8518787Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::register_pytree_node:0 2025-09-07T06:57:16.8520451Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_is_leaf:0, line 277 <- wrt source file 2025-09-07T06:57:16.8525075Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_is_leaf:0 2025-09-07T06:57:16.8526707Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_flatten:0, line 320 <- wrt source file 2025-09-07T06:57:16.8532358Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_flatten:0 2025-09-07T06:57:16.8533305Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_unflatten:0, line 357 <- wrt source file 2025-09-07T06:57:16.8537867Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_unflatten:0 2025-09-07T06:57:16.8538824Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_iter:0, line 387 <- wrt source file 2025-09-07T06:57:16.8544861Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_iter:0 2025-09-07T06:57:16.8545788Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_leaves:0, line 422 <- wrt source file 2025-09-07T06:57:16.8550104Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_leaves:0 2025-09-07T06:57:16.8551129Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_structure:0, line 457 <- wrt source file 2025-09-07T06:57:16.8555168Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_structure:0 2025-09-07T06:57:16.8556116Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_map:0, line 494 <- wrt source file 2025-09-07T06:57:16.8561652Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::tree_map:0 2025-09-07T06:57:16.8562606Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::broadcast_prefix:0, line 893 <- wrt source file 2025-09-07T06:57:16.8571258Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_cxx_pytree.py::broadcast_prefix:0 2025-09-07T06:57:16.8572238Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CppExtension:0, line 1186 <- wrt source file 2025-09-07T06:57:16.8573244Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CppExtension:0 2025-09-07T06:57:16.8574310Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:0, line 1258 <- wrt source file 2025-09-07T06:57:16.8575307Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:0 2025-09-07T06:57:16.8576412Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:1, line 1336 <- wrt source file 2025-09-07T06:57:16.8577430Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:1 2025-09-07T06:57:16.8578402Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::SyclExtension:0, line 1448 <- wrt source file 2025-09-07T06:57:16.8579388Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::SyclExtension:0 2025-09-07T06:57:16.8580316Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load:0, line 1684 <- wrt source file 2025-09-07T06:57:16.8581238Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load:0 2025-09-07T06:57:16.8582159Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load_inline:0, line 1960 <- wrt source file 2025-09-07T06:57:16.8583131Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load_inline:0 2025-09-07T06:57:16.8584184Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/backend_registration.py::rename_privateuse1_backend:0, line 69 <- wrt source file 2025-09-07T06:57:16.8585337Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/backend_registration.py::rename_privateuse1_backend:0 2025-09-07T06:57:16.8586518Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/backend_registration.py::generate_methods_for_privateuse1_backend:0, line 375 <- wrt source file 2025-09-07T06:57:16.8587870Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/backend_registration.py::generate_methods_for_privateuse1_backend:0 2025-09-07T06:57:16.8600044Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/backend_registration.py::_get_custom_mod_func:0, line 410 <- wrt source file 2025-09-07T06:57:16.8601281Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/backend_registration.py::_get_custom_mod_func:0 2025-09-07T06:57:16.8602312Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_pytree.py::register_dataclass:0, line 303 <- wrt source file 2025-09-07T06:57:16.8603318Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_pytree.py::register_dataclass:0 2025-09-07T06:57:16.8604286Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_pytree.py::register_constant:0, line 419 <- wrt source file 2025-09-07T06:57:16.8605263Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_pytree.py::register_constant:0 2025-09-07T06:57:16.8606200Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_pytree.py::tree_is_leaf:0, line 1026 <- wrt source file 2025-09-07T06:57:16.8608594Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_pytree.py::tree_is_leaf:0 2025-09-07T06:57:16.8609521Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_pytree.py::tree_map:0, line 1345 <- wrt source file 2025-09-07T06:57:16.8615764Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_pytree.py::tree_map:0 2025-09-07T06:57:16.8616815Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/throughput_benchmark.py::ThroughputBenchmark:0, line 77 <- wrt source file 2025-09-07T06:57:16.8617982Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/throughput_benchmark.py::ThroughputBenchmark:0 2025-09-07T06:57:16.8619169Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/dataset.py::IterableDataset:0, line 94 <- wrt source file 2025-09-07T06:57:16.8624377Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/dataset.py::IterableDataset:0 2025-09-07T06:57:16.8625402Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/dataset.py::StackDataset:0, line 219 <- wrt source file 2025-09-07T06:57:16.8626421Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/dataset.py::StackDataset:0 2025-09-07T06:57:16.8627404Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/dataset.py::random_split:0, line 441 <- wrt source file 2025-09-07T06:57:16.8628405Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/dataset.py::random_split:0 2025-09-07T06:57:16.8629450Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/distributed.py::DistributedSampler:0, line 55 <- wrt source file 2025-09-07T06:57:16.8630567Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/distributed.py::DistributedSampler:0 2025-09-07T06:57:16.8631579Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/sampler.py::Sampler:0, line 40 <- wrt source file 2025-09-07T06:57:16.8632570Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/sampler.py::Sampler:0 2025-09-07T06:57:16.8633588Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/sampler.py::WeightedRandomSampler:0, line 238 <- wrt source file 2025-09-07T06:57:16.8634810Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/sampler.py::WeightedRandomSampler:0 2025-09-07T06:57:16.8635917Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/sampler.py::BatchSampler:0, line 309 <- wrt source file 2025-09-07T06:57:16.8640409Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/sampler.py::BatchSampler:0 2025-09-07T06:57:16.8641483Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::IterDataPipe:0, line 97 <- wrt source file 2025-09-07T06:57:16.8644046Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::IterDataPipe:0 2025-09-07T06:57:16.8645169Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::MapDataPipe:0, line 268 <- wrt source file 2025-09-07T06:57:16.8646295Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::MapDataPipe:0 2025-09-07T06:57:16.8647431Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/utils/common.py::validate_input_col:0, line 37 <- wrt source file 2025-09-07T06:57:16.8648658Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/utils/common.py::validate_input_col:0 2025-09-07T06:57:16.8649827Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/utils/decoder.py::basichandlers:0, line 47 <- wrt source file 2025-09-07T06:57:16.8651347Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/utils/decoder.py::basichandlers:0 2025-09-07T06:57:16.8652525Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/selecting.py::FilterIterDataPipe:0, line 37 <- wrt source file 2025-09-07T06:57:16.8653914Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/selecting.py::FilterIterDataPipe:0 2025-09-07T06:57:16.8655152Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::BatcherIterDataPipe:0, line 53 <- wrt source file 2025-09-07T06:57:16.8656398Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::BatcherIterDataPipe:0 2025-09-07T06:57:16.8657602Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::UnBatcherIterDataPipe:0, line 113 <- wrt source file 2025-09-07T06:57:16.8658848Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::UnBatcherIterDataPipe:0 2025-09-07T06:57:16.8660048Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::GrouperIterDataPipe:0, line 180 <- wrt source file 2025-09-07T06:57:16.8661407Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::GrouperIterDataPipe:0 2025-09-07T06:57:16.8662621Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/fileopener.py::FileOpenerIterDataPipe:0, line 35 <- wrt source file 2025-09-07T06:57:16.8663890Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/fileopener.py::FileOpenerIterDataPipe:0 2025-09-07T06:57:16.8665121Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/filelister.py::FileListerIterDataPipe:0, line 30 <- wrt source file 2025-09-07T06:57:16.8666492Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/filelister.py::FileListerIterDataPipe:0 2025-09-07T06:57:16.8667794Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ConcaterIterDataPipe:0, line 38 <- wrt source file 2025-09-07T06:57:16.8700991Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ConcaterIterDataPipe:0 2025-09-07T06:57:16.8703057Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ForkerIterDataPipe:0, line 88 <- wrt source file 2025-09-07T06:57:16.8705146Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ForkerIterDataPipe:0 2025-09-07T06:57:16.8707157Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::_ChildDataPipe:0, line 304 <- wrt source file 2025-09-07T06:57:16.8708836Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::_ChildDataPipe:0 2025-09-07T06:57:16.8710054Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::DemultiplexerIterDataPipe:0, line 390 <- wrt source file 2025-09-07T06:57:16.8711349Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::DemultiplexerIterDataPipe:0 2025-09-07T06:57:16.8712598Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::MultiplexerIterDataPipe:0, line 604 <- wrt source file 2025-09-07T06:57:16.8713860Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::MultiplexerIterDataPipe:0 2025-09-07T06:57:16.8715171Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ZipperIterDataPipe:0, line 674 <- wrt source file 2025-09-07T06:57:16.8716403Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ZipperIterDataPipe:0 2025-09-07T06:57:16.8717591Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::MapperIterDataPipe:0, line 53 <- wrt source file 2025-09-07T06:57:16.8718795Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::MapperIterDataPipe:0 2025-09-07T06:57:16.8719980Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::CollatorIterDataPipe:0, line 201 <- wrt source file 2025-09-07T06:57:16.8721208Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::CollatorIterDataPipe:0 2025-09-07T06:57:16.8722435Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combinatorics.py::ShufflerIterDataPipe:0, line 90 <- wrt source file 2025-09-07T06:57:16.8723719Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combinatorics.py::ShufflerIterDataPipe:0 2025-09-07T06:57:16.8724988Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/streamreader.py::StreamReaderIterDataPipe:0, line 25 <- wrt source file 2025-09-07T06:57:16.8726285Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/streamreader.py::StreamReaderIterDataPipe:0 2025-09-07T06:57:16.8727641Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/utils.py::IterableWrapperIterDataPipe:0, line 29 <- wrt source file 2025-09-07T06:57:16.8728891Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/utils.py::IterableWrapperIterDataPipe:0 2025-09-07T06:57:16.8730273Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/grouping.py::BatcherMapDataPipe:0, line 29 <- wrt source file 2025-09-07T06:57:16.8731483Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/grouping.py::BatcherMapDataPipe:0 2025-09-07T06:57:16.8732654Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ConcaterMapDataPipe:0, line 29 <- wrt source file 2025-09-07T06:57:16.8733943Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ConcaterMapDataPipe:0 2025-09-07T06:57:16.8735117Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ZipperMapDataPipe:0, line 73 <- wrt source file 2025-09-07T06:57:16.8736312Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ZipperMapDataPipe:0 2025-09-07T06:57:16.8737475Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/callable.py::MapperMapDataPipe:0, line 35 <- wrt source file 2025-09-07T06:57:16.8738663Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/callable.py::MapperMapDataPipe:0 2025-09-07T06:57:16.8739862Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combinatorics.py::ShufflerIterDataPipe:0, line 34 <- wrt source file 2025-09-07T06:57:16.8741123Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combinatorics.py::ShufflerIterDataPipe:0 2025-09-07T06:57:16.8742434Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/utils.py::SequenceWrapperMapDataPipe:0, line 29 <- wrt source file 2025-09-07T06:57:16.8743682Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/datapipes/map/utils.py::SequenceWrapperMapDataPipe:0 2025-09-07T06:57:16.8744789Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_convert:0, line 39 <- wrt source file 2025-09-07T06:57:16.8745856Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_convert:0 2025-09-07T06:57:16.8746868Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::collate:0, line 137 <- wrt source file 2025-09-07T06:57:16.8747867Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::collate:0 2025-09-07T06:57:16.8748876Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_collate:0, line 364 <- wrt source file 2025-09-07T06:57:16.8749933Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_collate:0 2025-09-07T06:57:16.8750984Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::find_closure_group:0, line 439 <- wrt source file 2025-09-07T06:57:17.4777121Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::find_closure_group:0 2025-09-07T06:57:17.4779557Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::replace_extern_shared:0, line 535 <- wrt source file 2025-09-07T06:57:17.4780872Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::replace_extern_shared:0 2025-09-07T06:57:17.4782154Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_sympy/functions.py::MinMaxBase._collapse_arguments:0, line 724 <- wrt source file 2025-09-07T06:57:17.5254923Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_sympy/functions.py::MinMaxBase._collapse_arguments:0 2025-09-07T06:57:17.5256965Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.__init__:0, line 216 <- wrt source file 2025-09-07T06:57:17.5258977Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.__init__:0 2025-09-07T06:57:17.5260954Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_hparams:0, line 314 <- wrt source file 2025-09-07T06:57:17.5262162Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_hparams:0 2025-09-07T06:57:17.5263316Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalar:0, line 362 <- wrt source file 2025-09-07T06:57:17.5264496Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalar:0 2025-09-07T06:57:17.5265615Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalars:0, line 394 <- wrt source file 2025-09-07T06:57:17.5266764Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalars:0 2025-09-07T06:57:17.5268084Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_tensor:0, line 441 <- wrt source file 2025-09-07T06:57:17.5269253Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_tensor:0 2025-09-07T06:57:17.5270389Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram:0, line 480 <- wrt source file 2025-09-07T06:57:17.5271558Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram:0 2025-09-07T06:57:17.5272716Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram_raw:0, line 533 <- wrt source file 2025-09-07T06:57:17.5273909Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram_raw:0 2025-09-07T06:57:17.5275066Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_image:0, line 599 <- wrt source file 2025-09-07T06:57:17.5276228Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_image:0 2025-09-07T06:57:17.5277354Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_images:0, line 648 <- wrt source file 2025-09-07T06:57:17.5278500Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_images:0 2025-09-07T06:57:17.5279715Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_text:0, line 811 <- wrt source file 2025-09-07T06:57:17.5280846Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_text:0 2025-09-07T06:57:17.5282053Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_embedding:0, line 878 <- wrt source file 2025-09-07T06:57:17.5283299Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_embedding:0 2025-09-07T06:57:17.5284440Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_pr_curve:0, line 989 <- wrt source file 2025-09-07T06:57:17.5285603Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_pr_curve:0 2025-09-07T06:57:17.5286865Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_multilinechart:0, line 1063 <- wrt source file 2025-09-07T06:57:17.5288201Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_multilinechart:0 2025-09-07T06:57:17.5289507Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_marginchart:0, line 1084 <- wrt source file 2025-09-07T06:57:17.5290811Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_marginchart:0 2025-09-07T06:57:17.5292053Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars:0, line 1108 <- wrt source file 2025-09-07T06:57:17.5293268Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars:0 2025-09-07T06:57:17.5294600Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_mesh:0, line 1154 <- wrt source file 2025-09-07T06:57:17.5295745Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_mesh:0 2025-09-07T06:57:17.5296924Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_export/wrappers.py::mark_subclass_constructor_exportable_experimental:0, line 158 <- wrt source file 2025-09-07T06:57:17.5298169Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_export/wrappers.py::mark_subclass_constructor_exportable_experimental:0 2025-09-07T06:57:17.5299324Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_export/utils.py::register_module_as_pytree_input_node:0, line 1410 <- wrt source file 2025-09-07T06:57:17.5300437Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_export/utils.py::register_module_as_pytree_input_node:0 2025-09-07T06:57:17.5301456Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/profiler.py::profile:0, line 182 <- wrt source file 2025-09-07T06:57:17.5302411Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/profiler.py::profile:0 2025-09-07T06:57:17.5303373Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/profiler.py::record_function:0, line 745 <- wrt source file 2025-09-07T06:57:17.5304481Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/profiler.py::record_function:0 2025-09-07T06:57:17.5305430Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_itt:0, line 880 <- wrt source file 2025-09-07T06:57:17.5306449Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_itt:0 2025-09-07T06:57:17.5307452Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_nvtx:0, line 953 <- wrt source file 2025-09-07T06:57:17.5308405Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_nvtx:0 2025-09-07T06:57:17.5309331Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/forward_ad.py::make_dual:0, line 82 <- wrt source file 2025-09-07T06:57:17.5310281Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/forward_ad.py::make_dual:0 2025-09-07T06:57:17.5311233Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/forward_ad.py::unpack_dual:0, line 151 <- wrt source file 2025-09-07T06:57:17.5312207Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/forward_ad.py::unpack_dual:0 2025-09-07T06:57:17.5313160Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/forward_ad.py::dual_level:0, line 187 <- wrt source file 2025-09-07T06:57:17.5314135Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/forward_ad.py::dual_level:0 2025-09-07T06:57:17.5315106Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/anomaly_mode.py::detect_anomaly:0, line 28 <- wrt source file 2025-09-07T06:57:17.5316125Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/anomaly_mode.py::detect_anomaly:0 2025-09-07T06:57:17.5317076Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/grad_mode.py::no_grad:0, line 50 <- wrt source file 2025-09-07T06:57:17.5318079Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/grad_mode.py::no_grad:0 2025-09-07T06:57:17.5319017Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/grad_mode.py::enable_grad:0, line 108 <- wrt source file 2025-09-07T06:57:17.5319981Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/grad_mode.py::enable_grad:0 2025-09-07T06:57:17.5320937Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/grad_mode.py::set_grad_enabled:0, line 166 <- wrt source file 2025-09-07T06:57:17.5321968Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/grad_mode.py::set_grad_enabled:0 2025-09-07T06:57:17.5322945Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/grad_mode.py::inference_mode:0, line 246 <- wrt source file 2025-09-07T06:57:17.5323937Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/grad_mode.py::inference_mode:0 2025-09-07T06:57:17.5324886Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::vjp:0, line 293 <- wrt source file 2025-09-07T06:57:17.5325820Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::vjp:0 2025-09-07T06:57:17.5326725Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::jvp:0, line 395 <- wrt source file 2025-09-07T06:57:17.5327650Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::jvp:0 2025-09-07T06:57:17.5328661Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::jacobian:0, line 630 <- wrt source file 2025-09-07T06:57:17.5329623Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::jacobian:0 2025-09-07T06:57:17.5330677Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::hessian:0, line 894 <- wrt source file 2025-09-07T06:57:17.5331697Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::hessian:0 2025-09-07T06:57:17.5332624Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::vhp:0, line 1010 <- wrt source file 2025-09-07T06:57:17.5333538Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::vhp:0 2025-09-07T06:57:17.5334513Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::hvp:0, line 1109 <- wrt source file 2025-09-07T06:57:17.5335439Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/functional.py::hvp:0 2025-09-07T06:57:17.5336449Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_backward:0, line 71 <- wrt source file 2025-09-07T06:57:17.5337569Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_backward:0 2025-09-07T06:57:17.5338655Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_forward:0, line 115 <- wrt source file 2025-09-07T06:57:17.5339757Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_forward:0 2025-09-07T06:57:17.5340829Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_dirty:0, line 167 <- wrt source file 2025-09-07T06:57:17.5341897Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_dirty:0 2025-09-07T06:57:17.5343097Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_non_differentiable:0, line 214 <- wrt source file 2025-09-07T06:57:17.5344271Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_non_differentiable:0 2025-09-07T06:57:17.5345405Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.set_materialize_grads:0, line 243 <- wrt source file 2025-09-07T06:57:17.5346545Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.set_materialize_grads:0 2025-09-07T06:57:17.5347567Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::Function:0, line 485 <- wrt source file 2025-09-07T06:57:17.5348525Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py::Function:0 2025-09-07T06:57:17.5349448Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::Node.name:0, line 53 <- wrt source file 2025-09-07T06:57:17.5350356Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::Node.name:0 2025-09-07T06:57:17.5351285Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::Node.register_hook:0, line 110 <- wrt source file 2025-09-07T06:57:17.5352275Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::Node.register_hook:0 2025-09-07T06:57:17.5353355Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::Node.register_prehook:0, line 147 <- wrt source file 2025-09-07T06:57:17.5367927Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::Node.register_prehook:0 2025-09-07T06:57:17.5369098Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::saved_tensors_hooks:0, line 283 <- wrt source file 2025-09-07T06:57:17.5370224Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::saved_tensors_hooks:0 2025-09-07T06:57:17.5371581Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::save_on_cpu:0, line 353 <- wrt source file 2025-09-07T06:57:17.5373214Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::save_on_cpu:0 2025-09-07T06:57:17.5375019Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::disable_saved_tensors_hooks:0, line 410 <- wrt source file 2025-09-07T06:57:17.5376849Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::disable_saved_tensors_hooks:0 2025-09-07T06:57:17.5378604Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::register_multi_grad_hook:0, line 487 <- wrt source file 2025-09-07T06:57:17.5385613Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::register_multi_grad_hook:0 2025-09-07T06:57:17.5386672Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::allow_mutation_on_saved_tensors:0, line 753 <- wrt source file 2025-09-07T06:57:17.5404216Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py::allow_mutation_on_saved_tensors:0 2025-09-07T06:57:17.5405333Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/jit/_check.py::AttributeTypeIsSupportedChecker:0, line 36 <- wrt source file 2025-09-07T06:57:17.5406549Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/jit/_check.py::AttributeTypeIsSupportedChecker:0 2025-09-07T06:57:17.5407617Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_load_for_lite_interpreter:0, line 22 <- wrt source file 2025-09-07T06:57:17.5408700Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_load_for_lite_interpreter:0 2025-09-07T06:57:17.5409814Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_mobile_model_contained_types:0, line 122 <- wrt source file 2025-09-07T06:57:17.5410951Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_mobile_model_contained_types:0 2025-09-07T06:57:17.5412005Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_model_ops_and_info:0, line 214 <- wrt source file 2025-09-07T06:57:17.5413051Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_model_ops_and_info:0 2025-09-07T06:57:17.5414103Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_creation.py::make_tensor:0, line 114 <- wrt source file 2025-09-07T06:57:17.5415071Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_creation.py::make_tensor:0 2025-09-07T06:57:17.5416019Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_comparison.py::assert_close:0, line 1466 <- wrt source file 2025-09-07T06:57:17.5469277Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_comparison.py::assert_close:0 2025-09-07T06:57:17.5471047Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::parametrize:0, line 615 <- wrt source file 2025-09-07T06:57:17.5473078Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::parametrize:0 2025-09-07T06:57:17.5475153Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::reparametrize:0, line 736 <- wrt source file 2025-09-07T06:57:17.5477007Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::reparametrize:0 2025-09-07T06:57:17.5478789Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::decorateIf:0, line 825 <- wrt source file 2025-09-07T06:57:17.5480676Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::decorateIf:0 2025-09-07T06:57:17.5482894Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_symmetric_psd_matrix:0, line 4734 <- wrt source file 2025-09-07T06:57:17.5484886Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_symmetric_psd_matrix:0 2025-09-07T06:57:17.5486030Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_psd_matrix:0, line 4748 <- wrt source file 2025-09-07T06:57:17.5487198Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_psd_matrix:0 2025-09-07T06:57:17.5488345Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_pd_matrix:0, line 4778 <- wrt source file 2025-09-07T06:57:17.5489587Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_pd_matrix:0 2025-09-07T06:57:17.5490691Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/logging_utils.py::logs_to_string:0, line 194 <- wrt source file 2025-09-07T06:57:17.5491790Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/logging_utils.py::logs_to_string:0 2025-09-07T06:57:17.5492878Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/logging_utils.py::multiple_logs_to_string:0, line 220 <- wrt source file 2025-09-07T06:57:17.5494118Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/logging_utils.py::multiple_logs_to_string:0 2025-09-07T06:57:17.5495339Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/optests/autograd_registration.py::autograd_registration_check:0, line 29 <- wrt source file 2025-09-07T06:57:17.5496682Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/optests/autograd_registration.py::autograd_registration_check:0 2025-09-07T06:57:17.5497984Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/_tensor/common_dtensor.py::skip_unless_torch_gpu:0, line 331 <- wrt source file 2025-09-07T06:57:17.5499291Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/_tensor/common_dtensor.py::skip_unless_torch_gpu:0 2025-09-07T06:57:17.5500594Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/intrinsic/qat/modules/linear_relu.py::LinearReLU:0, line 30 <- wrt source file 2025-09-07T06:57:17.5501767Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/intrinsic/qat/modules/linear_relu.py::LinearReLU:0 2025-09-07T06:57:17.5503003Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearReLU:0, line 25 <- wrt source file 2025-09-07T06:57:17.5504303Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearReLU:0 2025-09-07T06:57:17.5505505Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearLeakyReLU:0, line 67 <- wrt source file 2025-09-07T06:57:17.5506757Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearLeakyReLU:0 2025-09-07T06:57:17.5507955Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearTanh:0, line 142 <- wrt source file 2025-09-07T06:57:17.5509157Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearTanh:0 2025-09-07T06:57:17.5510375Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py::LinearReLU:0, line 24 <- wrt source file 2025-09-07T06:57:17.5511656Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py::LinearReLU:0 2025-09-07T06:57:17.5512779Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTMCell:0, line 30 <- wrt source file 2025-09-07T06:57:17.5574729Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTMCell:0 2025-09-07T06:57:17.5575799Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTM:0, line 413 <- wrt source file 2025-09-07T06:57:17.5576946Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTM:0 2025-09-07T06:57:17.5577955Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv1d:0, line 211 <- wrt source file 2025-09-07T06:57:17.5578994Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv1d:0 2025-09-07T06:57:17.5579982Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv2d:0, line 283 <- wrt source file 2025-09-07T06:57:17.5580988Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv2d:0 2025-09-07T06:57:17.5581960Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv3d:0, line 359 <- wrt source file 2025-09-07T06:57:17.5582953Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv3d:0 2025-09-07T06:57:17.5583955Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::Quantize:0, line 95 <- wrt source file 2025-09-07T06:57:17.5585022Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::Quantize:0 2025-09-07T06:57:17.5586060Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::DeQuantize:0, line 145 <- wrt source file 2025-09-07T06:57:17.5587268Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::DeQuantize:0 2025-09-07T06:57:17.5588421Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::FloatFunctional:0, line 23 <- wrt source file 2025-09-07T06:57:17.5589745Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::FloatFunctional:0 2025-09-07T06:57:17.5591015Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::QFunctional:0, line 176 <- wrt source file 2025-09-07T06:57:17.5592229Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::QFunctional:0 2025-09-07T06:57:17.5593309Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv1d:0, line 376 <- wrt source file 2025-09-07T06:57:17.5594331Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv1d:0 2025-09-07T06:57:17.5595325Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv2d:0, line 505 <- wrt source file 2025-09-07T06:57:17.5596334Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv2d:0 2025-09-07T06:57:17.5597315Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv3d:0, line 635 <- wrt source file 2025-09-07T06:57:17.5598320Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv3d:0 2025-09-07T06:57:17.5599360Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose1d:0, line 892 <- wrt source file 2025-09-07T06:57:17.5600458Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose1d:0 2025-09-07T06:57:17.5601616Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose2d:0, line 1014 <- wrt source file 2025-09-07T06:57:17.5602739Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose2d:0 2025-09-07T06:57:17.5603819Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose3d:0, line 1140 <- wrt source file 2025-09-07T06:57:17.5604918Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose3d:0 2025-09-07T06:57:17.5606010Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::Embedding:0, line 111 <- wrt source file 2025-09-07T06:57:17.5623720Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::Embedding:0 2025-09-07T06:57:17.5624935Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::EmbeddingBag:0, line 275 <- wrt source file 2025-09-07T06:57:17.5635591Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::EmbeddingBag:0 2025-09-07T06:57:17.5636722Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/linear.py::Linear:0, line 138 <- wrt source file 2025-09-07T06:57:17.5637796Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/linear.py::Linear:0 2025-09-07T06:57:17.5639001Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/activation.py::ReLU6:0, line 36 <- wrt source file 2025-09-07T06:57:17.5641268Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/activation.py::ReLU6:0 2025-09-07T06:57:17.5642434Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/rnn.py::LSTM:0, line 24 <- wrt source file 2025-09-07T06:57:17.5643538Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/rnn.py::LSTM:0 2025-09-07T06:57:17.5644572Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv1d:0, line 43 <- wrt source file 2025-09-07T06:57:17.5645681Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv1d:0 2025-09-07T06:57:17.5646758Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv2d:0, line 124 <- wrt source file 2025-09-07T06:57:17.5647879Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv2d:0 2025-09-07T06:57:17.5648987Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv3d:0, line 209 <- wrt source file 2025-09-07T06:57:17.5650088Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv3d:0 2025-09-07T06:57:17.5651220Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose1d:0, line 296 <- wrt source file 2025-09-07T06:57:17.5652407Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose1d:0 2025-09-07T06:57:17.5653679Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose2d:0, line 378 <- wrt source file 2025-09-07T06:57:17.5654983Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose2d:0 2025-09-07T06:57:17.5656126Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose3d:0, line 460 <- wrt source file 2025-09-07T06:57:17.5657294Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose3d:0 2025-09-07T06:57:17.5658404Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/linear.py::Linear:0, line 30 <- wrt source file 2025-09-07T06:57:17.5659535Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/linear.py::Linear:0 2025-09-07T06:57:17.5660596Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTM:0, line 515 <- wrt source file 2025-09-07T06:57:17.5661660Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTM:0 2025-09-07T06:57:17.5662693Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRU:0, line 801 <- wrt source file 2025-09-07T06:57:17.5663747Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRU:0 2025-09-07T06:57:17.5664907Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::RNNCell:0, line 1206 <- wrt source file 2025-09-07T06:57:17.5665997Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::RNNCell:0 2025-09-07T06:57:17.5667148Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTMCell:0, line 1273 <- wrt source file 2025-09-07T06:57:17.5668350Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTMCell:0 2025-09-07T06:57:17.5669422Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRUCell:0, line 1326 <- wrt source file 2025-09-07T06:57:17.5670509Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRUCell:0 2025-09-07T06:57:17.5671582Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/scheduler/lambda_scheduler.py::LambdaSL:0, line 24 <- wrt source file 2025-09-07T06:57:17.5676570Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/scheduler/lambda_scheduler.py::LambdaSL:0 2025-09-07T06:57:17.5677835Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py::BaseDataSparsifier:0, line 55 <- wrt source file 2025-09-07T06:57:17.5679228Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py::BaseDataSparsifier:0 2025-09-07T06:57:17.5680642Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py::BaseDataScheduler.get_schedule_param:0, line 98 <- wrt source file 2025-09-07T06:57:17.5695010Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py::BaseDataScheduler.get_schedule_param:0 2025-09-07T06:57:17.5696474Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier:0, line 47 <- wrt source file 2025-09-07T06:57:17.5697687Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier:0 2025-09-07T06:57:17.5698894Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier.squash_mask:0, line 245 <- wrt source file 2025-09-07T06:57:17.5700798Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier.squash_mask:0 2025-09-07T06:57:17.5701913Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::fuse_fx:0, line 218 <- wrt source file 2025-09-07T06:57:17.5702925Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::fuse_fx:0 2025-09-07T06:57:17.5703947Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_fx:0, line 288 <- wrt source file 2025-09-07T06:57:17.5704991Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_fx:0 2025-09-07T06:57:17.5706028Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_qat_fx:0, line 427 <- wrt source file 2025-09-07T06:57:17.5707096Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_qat_fx:0 2025-09-07T06:57:17.5708244Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_fx:0, line 608 <- wrt source file 2025-09-07T06:57:17.5709275Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_fx:0 2025-09-07T06:57:17.5710419Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_to_reference_fx:0, line 668 <- wrt source file 2025-09-07T06:57:17.5711645Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_to_reference_fx:0 2025-09-07T06:57:17.5712797Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::_convert_to_reference_decomposed_fx:0, line 720 <- wrt source file 2025-09-07T06:57:17.5714027Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::_convert_to_reference_decomposed_fx:0 2025-09-07T06:57:17.5715145Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fuse_modules.py::fuse_modules:0, line 176 <- wrt source file 2025-09-07T06:57:17.5716204Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fuse_modules.py::fuse_modules:0 2025-09-07T06:57:17.5717278Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn:0, line 31 <- wrt source file 2025-09-07T06:57:17.5719753Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn:0 2025-09-07T06:57:17.5720922Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn_relu:0, line 76 <- wrt source file 2025-09-07T06:57:17.5726650Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn_relu:0 2025-09-07T06:57:17.5727899Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_linear_bn:0, line 130 <- wrt source file 2025-09-07T06:57:17.5732456Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_linear_bn:0 2025-09-07T06:57:17.5733637Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_convtranspose_bn:0, line 163 <- wrt source file 2025-09-07T06:57:17.5738810Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_convtranspose_bn:0 2025-09-07T06:57:17.5739908Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_args:0, line 110 <- wrt source file 2025-09-07T06:57:17.5740918Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_args:0 2025-09-07T06:57:17.5741948Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_callable_args:0, line 132 <- wrt source file 2025-09-07T06:57:17.5743036Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_callable_args:0 2025-09-07T06:57:17.5744083Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_pt2e.py::prepare_pt2e:0, line 51 <- wrt source file 2025-09-07T06:57:17.5745146Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_pt2e.py::prepare_pt2e:0 2025-09-07T06:57:17.5746336Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_pt2e.py::prepare_qat_pt2e:0, line 130 <- wrt source file 2025-09-07T06:57:17.5747436Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_pt2e.py::prepare_qat_pt2e:0 2025-09-07T06:57:17.5748573Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_pt2e.py::convert_pt2e:0, line 228 <- wrt source file 2025-09-07T06:57:17.5749712Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/quantize_pt2e.py::convert_pt2e:0 2025-09-07T06:57:17.5750734Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::get_combined_dict:0, line 172 <- wrt source file 2025-09-07T06:57:17.5751802Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::get_combined_dict:0 2025-09-07T06:57:17.5752826Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_path_of_module:0, line 544 <- wrt source file 2025-09-07T06:57:17.5753868Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_path_of_module:0 2025-09-07T06:57:17.5754904Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_signature_locals:0, line 566 <- wrt source file 2025-09-07T06:57:17.5755967Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_signature_locals:0 2025-09-07T06:57:17.5757003Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_default_kwargs:0, line 580 <- wrt source file 2025-09-07T06:57:17.5758041Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_default_kwargs:0 2025-09-07T06:57:17.5759057Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_normalize_kwargs:0, line 602 <- wrt source file 2025-09-07T06:57:17.5760192Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_normalize_kwargs:0 2025-09-07T06:57:17.5761203Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_num_pos_args:0, line 729 <- wrt source file 2025-09-07T06:57:17.5762227Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_num_pos_args:0 2025-09-07T06:57:17.5763340Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/pt2e/_affine_quantization.py::_get_reduction_params:0, line 102 <- wrt source file 2025-09-07T06:57:17.5764584Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/pt2e/_affine_quantization.py::_get_reduction_params:0 2025-09-07T06:57:17.5765778Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/pt2e/_affine_quantization.py::_register_custom_op:0, line 148 <- wrt source file 2025-09-07T06:57:17.5766990Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/pt2e/_affine_quantization.py::_register_custom_op:0 2025-09-07T06:57:17.5768160Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/pt2e/prepare.py::_get_edge_or_node_to_group_id:0, line 189 <- wrt source file 2025-09-07T06:57:17.5769355Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/pt2e/prepare.py::_get_edge_or_node_to_group_id:0 2025-09-07T06:57:17.5770653Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/pt2e/utils.py::_replace_literals_with_new_placeholders:0, line 436 <- wrt source file 2025-09-07T06:57:17.5771914Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/pt2e/utils.py::_replace_literals_with_new_placeholders:0 2025-09-07T06:57:17.5773186Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/backend_config/backend_config.py::DTypeConfig:0, line 214 <- wrt source file 2025-09-07T06:57:17.5774558Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/backend_config/backend_config.py::DTypeConfig:0 2025-09-07T06:57:17.5775735Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/backend_config/onednn.py::_fuse_linear_bn_leaky_relu:0, line 85 <- wrt source file 2025-09-07T06:57:17.5776964Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/backend_config/onednn.py::_fuse_linear_bn_leaky_relu:0 2025-09-07T06:57:17.5778152Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report.py::ModelReport:0, line 84 <- wrt source file 2025-09-07T06:57:17.5779351Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report.py::ModelReport:0 2025-09-07T06:57:17.5780708Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_filtered_tables:0, line 339 <- wrt source file 2025-09-07T06:57:17.5782292Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_filtered_tables:0 2025-09-07T06:57:17.5783842Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_table_visualization:0, line 427 <- wrt source file 2025-09-07T06:57:17.5785524Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_table_visualization:0 2025-09-07T06:57:17.5787086Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_plot_visualization:0, line 589 <- wrt source file 2025-09-07T06:57:17.5788669Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_plot_visualization:0 2025-09-07T06:57:17.5790249Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_histogram_visualization:0, line 662 <- wrt source file 2025-09-07T06:57:17.5791879Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_histogram_visualization:0 2025-09-07T06:57:17.5793142Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_equal:0, line 171 <- wrt source file 2025-09-07T06:57:17.5807759Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_equal:0 2025-09-07T06:57:17.5808798Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::print_assert_equal:0, line 302 <- wrt source file 2025-09-07T06:57:17.5809853Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::print_assert_equal:0 2025-09-07T06:57:17.5810989Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_almost_equal:0, line 375 <- wrt source file 2025-09-07T06:57:17.5858549Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_almost_equal:0 2025-09-07T06:57:17.5860485Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_approx_equal:0, line 496 <- wrt source file 2025-09-07T06:57:17.5862698Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_approx_equal:0 2025-09-07T06:57:17.5864729Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_array_equal:0, line 793 <- wrt source file 2025-09-07T06:57:17.5933040Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_array_equal:0 2025-09-07T06:57:17.5934967Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_array_almost_equal:0, line 899 <- wrt source file 2025-09-07T06:57:17.5997546Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_array_almost_equal:0 2025-09-07T06:57:17.5999331Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_array_less:0, line 1008 <- wrt source file 2025-09-07T06:57:17.6053133Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_array_less:0 2025-09-07T06:57:17.6054990Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_string_equal:0, line 1073 <- wrt source file 2025-09-07T06:57:17.6056772Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_string_equal:0 2025-09-07T06:57:17.6058482Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_allclose:0, line 1294 <- wrt source file 2025-09-07T06:57:17.6074478Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_allclose:0 2025-09-07T06:57:17.6075539Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_array_almost_equal_nulp:0, line 1360 <- wrt source file 2025-09-07T06:57:17.6079055Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_array_almost_equal_nulp:0 2025-09-07T06:57:17.6080114Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_array_max_ulp:0, line 1423 <- wrt source file 2025-09-07T06:57:17.6083576Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_array_max_ulp:0 2025-09-07T06:57:17.6084590Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::nulp_diff:0, line 1468 <- wrt source file 2025-09-07T06:57:17.6085578Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::nulp_diff:0 2025-09-07T06:57:17.6086532Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_warns:0, line 1578 <- wrt source file 2025-09-07T06:57:17.6090374Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::assert_warns:0 2025-09-07T06:57:17.6091382Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::clear_and_catch_warnings:0, line 1881 <- wrt source file 2025-09-07T06:57:17.6093532Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_numpy/testing/utils.py::clear_and_catch_warnings:0 2025-09-07T06:57:17.6094649Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/exponential.py::Exponential:0, line 20 <- wrt source file 2025-09-07T06:57:17.6098185Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/exponential.py::Exponential:0 2025-09-07T06:57:17.6099318Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/categorical.py::Categorical:0, line 42 <- wrt source file 2025-09-07T06:57:17.6103250Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/categorical.py::Categorical:0 2025-09-07T06:57:17.6104250Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/cauchy.py::Cauchy:0, line 23 <- wrt source file 2025-09-07T06:57:17.6107642Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/cauchy.py::Cauchy:0 2025-09-07T06:57:17.6108602Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/von_mises.py::VonMises:0, line 117 <- wrt source file 2025-09-07T06:57:17.6116314Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/von_mises.py::VonMises:0 2025-09-07T06:57:17.6117458Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/lowrank_multivariate_normal.py::LowRankMultivariateNormal:0, line 63 <- wrt source file 2025-09-07T06:57:17.6118758Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/lowrank_multivariate_normal.py::LowRankMultivariateNormal:0 2025-09-07T06:57:17.6119921Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/logistic_normal.py::LogisticNormal:0, line 28 <- wrt source file 2025-09-07T06:57:17.6125153Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/logistic_normal.py::LogisticNormal:0 2025-09-07T06:57:17.6126261Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/gamma.py::Gamma:0, line 24 <- wrt source file 2025-09-07T06:57:17.6129783Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/gamma.py::Gamma:0 2025-09-07T06:57:17.6130739Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/log_normal.py::LogNormal:0, line 23 <- wrt source file 2025-09-07T06:57:17.6134772Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/log_normal.py::LogNormal:0 2025-09-07T06:57:17.6135737Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/uniform.py::Uniform:0, line 21 <- wrt source file 2025-09-07T06:57:17.6139294Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/uniform.py::Uniform:0 2025-09-07T06:57:17.6140313Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/transforms.py::CatTransform:0, line 1065 <- wrt source file 2025-09-07T06:57:17.6141367Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/transforms.py::CatTransform:0 2025-09-07T06:57:17.6142409Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/transforms.py::StackTransform:0, line 1177 <- wrt source file 2025-09-07T06:57:17.6143473Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/transforms.py::StackTransform:0 2025-09-07T06:57:17.6144735Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/transforms.py::CumulativeDistributionTransform:0, line 1253 <- wrt source file 2025-09-07T06:57:17.6145959Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/transforms.py::CumulativeDistributionTransform:0 2025-09-07T06:57:17.6147129Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/studentT.py::StudentT:0, line 22 <- wrt source file 2025-09-07T06:57:17.6148196Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/studentT.py::StudentT:0 2025-09-07T06:57:17.6149185Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/independent.py::Independent:0, line 27 <- wrt source file 2025-09-07T06:57:17.6156277Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/independent.py::Independent:0 2025-09-07T06:57:17.6157316Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/kumaraswamy.py::Kumaraswamy:0, line 30 <- wrt source file 2025-09-07T06:57:17.6162420Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/kumaraswamy.py::Kumaraswamy:0 2025-09-07T06:57:17.6163445Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/constraints.py::is_dependent:0, line 166 <- wrt source file 2025-09-07T06:57:17.6169130Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/constraints.py::is_dependent:0 2025-09-07T06:57:17.6170220Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/constraints.py::_DependentProperty:0, line 187 <- wrt source file 2025-09-07T06:57:17.6171333Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/constraints.py::_DependentProperty:0 2025-09-07T06:57:17.6172341Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/gumbel.py::Gumbel:0, line 23 <- wrt source file 2025-09-07T06:57:17.6176291Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/gumbel.py::Gumbel:0 2025-09-07T06:57:17.6177260Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/wishart.py::Wishart:0, line 39 <- wrt source file 2025-09-07T06:57:17.6178246Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/wishart.py::Wishart:0 2025-09-07T06:57:17.6179297Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/multivariate_normal.py::MultivariateNormal:0, line 103 <- wrt source file 2025-09-07T06:57:17.6180495Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/multivariate_normal.py::MultivariateNormal:0 2025-09-07T06:57:17.6181507Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/beta.py::Beta:0, line 21 <- wrt source file 2025-09-07T06:57:17.6183309Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/beta.py::Beta:0 2025-09-07T06:57:17.6184403Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/relaxed_categorical.py::RelaxedOneHotCategorical:0, line 116 <- wrt source file 2025-09-07T06:57:17.6190128Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/relaxed_categorical.py::RelaxedOneHotCategorical:0 2025-09-07T06:57:17.6191299Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/fishersnedecor.py::FisherSnedecor:0, line 21 <- wrt source file 2025-09-07T06:57:17.6195287Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/fishersnedecor.py::FisherSnedecor:0 2025-09-07T06:57:17.6196339Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/multinomial.py::Multinomial:0, line 38 <- wrt source file 2025-09-07T06:57:17.6197520Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/multinomial.py::Multinomial:0 2025-09-07T06:57:17.6198571Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/weibull.py::Weibull:0, line 22 <- wrt source file 2025-09-07T06:57:17.6202240Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/weibull.py::Weibull:0 2025-09-07T06:57:17.6203281Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/generalized_pareto.py::GeneralizedPareto:0, line 26 <- wrt source file 2025-09-07T06:57:17.6208197Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/generalized_pareto.py::GeneralizedPareto:0 2025-09-07T06:57:17.6209272Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/bernoulli.py::Bernoulli:0, line 30 <- wrt source file 2025-09-07T06:57:17.6212473Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/bernoulli.py::Bernoulli:0 2025-09-07T06:57:17.6213435Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/pareto.py::Pareto:0, line 20 <- wrt source file 2025-09-07T06:57:17.6217716Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/pareto.py::Pareto:0 2025-09-07T06:57:17.6218684Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/poisson.py::Poisson:0, line 25 <- wrt source file 2025-09-07T06:57:17.6219674Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/poisson.py::Poisson:0 2025-09-07T06:57:17.6220608Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/normal.py::Normal:0, line 22 <- wrt source file 2025-09-07T06:57:17.6223005Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/normal.py::Normal:0 2025-09-07T06:57:17.6223959Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/laplace.py::Laplace:0, line 20 <- wrt source file 2025-09-07T06:57:17.6227325Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/laplace.py::Laplace:0 2025-09-07T06:57:17.6228304Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/geometric.py::Geometric:0, line 36 <- wrt source file 2025-09-07T06:57:17.6231536Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/geometric.py::Geometric:0 2025-09-07T06:57:17.6232621Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/continuous_bernoulli.py::ContinuousBernoulli:0, line 35 <- wrt source file 2025-09-07T06:57:17.6237057Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/continuous_bernoulli.py::ContinuousBernoulli:0 2025-09-07T06:57:17.6238165Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/binomial.py::Binomial:0, line 31 <- wrt source file 2025-09-07T06:57:17.6243550Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/binomial.py::Binomial:0 2025-09-07T06:57:17.6244543Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/dirichlet.py::Dirichlet:0, line 42 <- wrt source file 2025-09-07T06:57:17.6247755Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/dirichlet.py::Dirichlet:0 2025-09-07T06:57:17.6248762Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/lkj_cholesky.py::LKJCholesky:0, line 43 <- wrt source file 2025-09-07T06:57:17.6294972Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/lkj_cholesky.py::LKJCholesky:0 2025-09-07T06:57:17.6296282Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/chi2.py::Chi2:0, line 18 <- wrt source file 2025-09-07T06:57:17.6297423Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/chi2.py::Chi2:0 2025-09-07T06:57:17.6298592Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/half_cauchy.py::HalfCauchy:0, line 24 <- wrt source file 2025-09-07T06:57:17.6299844Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/half_cauchy.py::HalfCauchy:0 2025-09-07T06:57:17.6301162Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/mixture_same_family.py::MixtureSameFamily:0, line 24 <- wrt source file 2025-09-07T06:57:17.6302676Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/mixture_same_family.py::MixtureSameFamily:0 2025-09-07T06:57:17.6304068Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/utils.py::clamp_probs:0, line 114 <- wrt source file 2025-09-07T06:57:17.6305373Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/utils.py::clamp_probs:0 2025-09-07T06:57:17.6306764Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/relaxed_bernoulli.py::RelaxedBernoulli:0, line 127 <- wrt source file 2025-09-07T06:57:17.6308284Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/relaxed_bernoulli.py::RelaxedBernoulli:0 2025-09-07T06:57:17.6309711Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/inverse_gamma.py::InverseGamma:0, line 24 <- wrt source file 2025-09-07T06:57:17.6311214Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/inverse_gamma.py::InverseGamma:0 2025-09-07T06:57:17.6312652Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/one_hot_categorical.py::OneHotCategorical:0, line 34 <- wrt source file 2025-09-07T06:57:17.6313813Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/one_hot_categorical.py::OneHotCategorical:0 2025-09-07T06:57:17.6314861Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/half_normal.py::HalfNormal:0, line 24 <- wrt source file 2025-09-07T06:57:17.6315902Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributions/half_normal.py::HalfNormal:0 2025-09-07T06:57:17.6316843Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::calculate_gain:0, line 171 <- wrt source file 2025-09-07T06:57:17.6317742Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::calculate_gain:0 2025-09-07T06:57:17.6318595Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::uniform_:0, line 230 <- wrt source file 2025-09-07T06:57:17.6319447Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::uniform_:0 2025-09-07T06:57:17.6320270Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::normal_:0, line 257 <- wrt source file 2025-09-07T06:57:17.6321270Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::normal_:0 2025-09-07T06:57:17.6322119Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::trunc_normal_:0, line 292 <- wrt source file 2025-09-07T06:57:17.6323007Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::trunc_normal_:0 2025-09-07T06:57:17.6323938Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::constant_:0, line 306 <- wrt source file 2025-09-07T06:57:17.6324859Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::constant_:0 2025-09-07T06:57:17.6325672Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::ones_:0, line 323 <- wrt source file 2025-09-07T06:57:17.6326504Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::ones_:0 2025-09-07T06:57:17.6327306Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::zeros_:0, line 336 <- wrt source file 2025-09-07T06:57:17.6328131Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::zeros_:0 2025-09-07T06:57:17.6328930Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::eye_:0, line 352 <- wrt source file 2025-09-07T06:57:17.6330185Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::eye_:0 2025-09-07T06:57:17.6330986Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::dirac_:0, line 374 <- wrt source file 2025-09-07T06:57:17.6335595Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::dirac_:0 2025-09-07T06:57:17.6336466Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::xavier_uniform_:0, line 460 <- wrt source file 2025-09-07T06:57:17.6339470Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::xavier_uniform_:0 2025-09-07T06:57:17.6350189Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::xavier_normal_:0, line 492 <- wrt source file 2025-09-07T06:57:17.6351350Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::xavier_normal_:0 2025-09-07T06:57:17.6352313Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::kaiming_uniform_:0, line 543 <- wrt source file 2025-09-07T06:57:17.6353257Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::kaiming_uniform_:0 2025-09-07T06:57:17.6354164Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::kaiming_normal_:0, line 608 <- wrt source file 2025-09-07T06:57:17.6355107Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::kaiming_normal_:0 2025-09-07T06:57:17.6356008Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::orthogonal_:0, line 647 <- wrt source file 2025-09-07T06:57:17.6356917Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::orthogonal_:0 2025-09-07T06:57:17.6357779Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::sparse_:0, line 700 <- wrt source file 2025-09-07T06:57:17.6358650Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/init.py::sparse_:0 2025-09-07T06:57:17.6359494Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_input:0, line 32 <- wrt source file 2025-09-07T06:57:17.6365580Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_input:0 2025-09-07T06:57:17.6366625Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_weight:0, line 79 <- wrt source file 2025-09-07T06:57:17.6370636Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_weight:0 2025-09-07T06:57:17.6371719Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_input:0, line 130 <- wrt source file 2025-09-07T06:57:17.6418130Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_input:0 2025-09-07T06:57:17.6419343Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_weight:0, line 177 <- wrt source file 2025-09-07T06:57:17.6423476Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_weight:0 2025-09-07T06:57:17.6424386Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_input:0, line 228 <- wrt source file 2025-09-07T06:57:17.6490337Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_input:0 2025-09-07T06:57:17.6491479Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_weight:0, line 275 <- wrt source file 2025-09-07T06:57:17.6513128Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_weight:0 2025-09-07T06:57:17.6514675Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool2d_with_indices:0, line 460 <- wrt source file 2025-09-07T06:57:17.6547757Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool2d_with_indices:0 2025-09-07T06:57:17.6549426Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool3d_with_indices:0, line 579 <- wrt source file 2025-09-07T06:57:17.7167596Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool3d_with_indices:0 2025-09-07T06:57:17.7203378Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::gumbel_softmax:0, line 2174 <- wrt source file 2025-09-07T06:57:17.7213090Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::gumbel_softmax:0 2025-09-07T06:57:17.7214544Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::embedding:0, line 2478 <- wrt source file 2025-09-07T06:57:17.7222955Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::embedding:0 2025-09-07T06:57:17.7224375Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::embedding_bag:0, line 2618 <- wrt source file 2025-09-07T06:57:17.7235391Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::embedding_bag:0 2025-09-07T06:57:17.7236759Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::ctc_loss:0, line 3051 <- wrt source file 2025-09-07T06:57:17.7249584Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::ctc_loss:0 2025-09-07T06:57:17.7250932Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::nll_loss:0, line 3121 <- wrt source file 2025-09-07T06:57:17.7257424Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::nll_loss:0 2025-09-07T06:57:17.7258776Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::cross_entropy:0, line 3430 <- wrt source file 2025-09-07T06:57:17.7268204Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::cross_entropy:0 2025-09-07T06:57:17.7269662Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy:0, line 3495 <- wrt source file 2025-09-07T06:57:17.7275037Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy:0 2025-09-07T06:57:17.7276686Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy_with_logits:0, line 3565 <- wrt source file 2025-09-07T06:57:17.7284006Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy_with_logits:0 2025-09-07T06:57:17.7285472Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::pad:0, line 5263 <- wrt source file 2025-09-07T06:57:17.7295734Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py::pad:0 2025-09-07T06:57:17.7297143Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/__init__.py::sdpa_kernel:0, line 120 <- wrt source file 2025-09-07T06:57:17.7298628Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/__init__.py::sdpa_kernel:0 2025-09-07T06:57:17.7300099Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/stateless.py::functional_call:0, line 196 <- wrt source file 2025-09-07T06:57:17.7301653Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/stateless.py::functional_call:0 2025-09-07T06:57:17.7303059Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/init.py::skip_init:0, line 33 <- wrt source file 2025-09-07T06:57:17.7317665Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/init.py::skip_init:0 2025-09-07T06:57:17.7319298Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/_per_sample_grad.py::call_for_per_sample_grads:0, line 35 <- wrt source file 2025-09-07T06:57:17.7320990Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/_per_sample_grad.py::call_for_per_sample_grads:0 2025-09-07T06:57:17.7322521Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_packed_sequence:0, line 359 <- wrt source file 2025-09-07T06:57:17.7336870Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_packed_sequence:0 2025-09-07T06:57:17.7338299Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_sequence:0, line 439 <- wrt source file 2025-09-07T06:57:17.7343079Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_sequence:0 2025-09-07T06:57:17.7344471Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpad_sequence:0, line 500 <- wrt source file 2025-09-07T06:57:17.7415491Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpad_sequence:0 2025-09-07T06:57:17.7433293Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pack_sequence:0, line 556 <- wrt source file 2025-09-07T06:57:17.7434896Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pack_sequence:0 2025-09-07T06:57:17.7436281Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpack_sequence:0, line 584 <- wrt source file 2025-09-07T06:57:17.7447369Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpack_sequence:0 2025-09-07T06:57:17.7448683Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::spectral_norm:0, line 314 <- wrt source file 2025-09-07T06:57:17.7450216Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::spectral_norm:0 2025-09-07T06:57:17.7451736Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::remove_spectral_norm:0, line 346 <- wrt source file 2025-09-07T06:57:17.7453171Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::remove_spectral_norm:0 2025-09-07T06:57:17.7454924Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/memory_format.py::convert_conv2d_weight_memory_format:0, line 64 <- wrt source file 2025-09-07T06:57:17.7456399Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/memory_format.py::convert_conv2d_weight_memory_format:0 2025-09-07T06:57:17.7457901Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/memory_format.py::convert_conv3d_weight_memory_format:0, line 142 <- wrt source file 2025-09-07T06:57:17.7459390Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/memory_format.py::convert_conv3d_weight_memory_format:0 2025-09-07T06:57:17.7460706Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::weight_norm:0, line 134 <- wrt source file 2025-09-07T06:57:17.7466084Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::weight_norm:0 2025-09-07T06:57:17.7467134Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::remove_weight_norm:0, line 156 <- wrt source file 2025-09-07T06:57:17.7472275Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::remove_weight_norm:0 2025-09-07T06:57:17.7473541Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::orthogonal:0, line 265 <- wrt source file 2025-09-07T06:57:17.7474668Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::orthogonal:0 2025-09-07T06:57:17.7475722Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::weight_norm:0, line 360 <- wrt source file 2025-09-07T06:57:17.7483307Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::weight_norm:0 2025-09-07T06:57:17.7484376Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::spectral_norm:0, line 591 <- wrt source file 2025-09-07T06:57:17.7485480Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::spectral_norm:0 2025-09-07T06:57:17.7486487Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::identity:0, line 849 <- wrt source file 2025-09-07T06:57:17.7487448Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::identity:0 2025-09-07T06:57:17.7488407Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_unstructured:0, line 885 <- wrt source file 2025-09-07T06:57:17.7489437Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_unstructured:0 2025-09-07T06:57:17.7490584Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::l1_unstructured:0, line 928 <- wrt source file 2025-09-07T06:57:17.7491569Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::l1_unstructured:0 2025-09-07T06:57:17.7492625Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_structured:0, line 968 <- wrt source file 2025-09-07T06:57:17.7493715Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_structured:0 2025-09-07T06:57:17.7494799Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::ln_structured:0, line 1014 <- wrt source file 2025-09-07T06:57:17.7501638Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::ln_structured:0 2025-09-07T06:57:17.7502667Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::global_unstructured:0, line 1067 <- wrt source file 2025-09-07T06:57:17.7519721Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::global_unstructured:0 2025-09-07T06:57:17.7520731Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::custom_from_mask:0, line 1169 <- wrt source file 2025-09-07T06:57:17.7529941Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::custom_from_mask:0 2025-09-07T06:57:17.7530889Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::remove:0, line 1197 <- wrt source file 2025-09-07T06:57:17.7538265Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::remove:0 2025-09-07T06:57:17.7539672Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::is_pruned:0, line 1225 <- wrt source file 2025-09-07T06:57:17.7546569Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/prune.py::is_pruned:0 2025-09-07T06:57:17.7548239Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/conv_utils.py::unfold3d:0, line 315 <- wrt source file 2025-09-07T06:57:17.7549894Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/conv_utils.py::unfold3d:0 2025-09-07T06:57:17.7551676Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/expanded_weights_utils.py::sum_over_all_but_batch_and_last_n:0, line 178 <- wrt source file 2025-09-07T06:57:17.7554880Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/expanded_weights_utils.py::sum_over_all_but_batch_and_last_n:0 2025-09-07T06:57:17.7556584Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py::DataParallel:0, line 127 <- wrt source file 2025-09-07T06:57:17.7558036Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py::DataParallel:0 2025-09-07T06:57:17.7559507Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel:0, line 642 <- wrt source file 2025-09-07T06:57:17.7561051Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel:0 2025-09-07T06:57:17.7562608Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.no_sync:0, line 1446 <- wrt source file 2025-09-07T06:57:17.7564255Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.no_sync:0 2025-09-07T06:57:17.7565850Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.join:0, line 1833 <- wrt source file 2025-09-07T06:57:17.7567163Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.join:0 2025-09-07T06:57:17.7568482Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:0, line 1999 <- wrt source file 2025-09-07T06:57:17.7569782Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:0 2025-09-07T06:57:17.7571046Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:1, line 2009 <- wrt source file 2025-09-07T06:57:17.7572344Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:1 2025-09-07T06:57:17.7573680Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_builtin_comm_hook:0, line 2044 <- wrt source file 2025-09-07T06:57:17.7575169Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_builtin_comm_hook:0 2025-09-07T06:57:17.7576485Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_fused_optim:0, line 2102 <- wrt source file 2025-09-07T06:57:17.7577785Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_fused_optim:0 2025-09-07T06:57:17.7578884Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding:0, line 71 <- wrt source file 2025-09-07T06:57:17.7579992Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding:0 2025-09-07T06:57:17.7581013Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding.from_pretrained:0, line 243 <- wrt source file 2025-09-07T06:57:17.7582120Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding.from_pretrained:0 2025-09-07T06:57:17.7583127Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag:0, line 322 <- wrt source file 2025-09-07T06:57:17.7595012Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag:0 2025-09-07T06:57:17.7596834Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag.from_pretrained:0, line 521 <- wrt source file 2025-09-07T06:57:17.7601872Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag.from_pretrained:0 2025-09-07T06:57:17.7603502Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::L1Loss:0, line 115 <- wrt source file 2025-09-07T06:57:17.7608360Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::L1Loss:0 2025-09-07T06:57:17.7609860Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::NLLLoss:0, line 215 <- wrt source file 2025-09-07T06:57:17.7637883Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::NLLLoss:0 2025-09-07T06:57:17.7639526Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::PoissonNLLLoss:0, line 329 <- wrt source file 2025-09-07T06:57:17.7645472Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::PoissonNLLLoss:0 2025-09-07T06:57:17.7647175Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::GaussianNLLLoss:0, line 418 <- wrt source file 2025-09-07T06:57:17.7660663Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::GaussianNLLLoss:0 2025-09-07T06:57:17.7662233Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::KLDivLoss:0, line 535 <- wrt source file 2025-09-07T06:57:17.7670217Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::KLDivLoss:0 2025-09-07T06:57:17.7671707Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::MSELoss:0, line 617 <- wrt source file 2025-09-07T06:57:17.7676894Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::MSELoss:0 2025-09-07T06:57:17.7678413Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCELoss:0, line 703 <- wrt source file 2025-09-07T06:57:17.7684172Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCELoss:0 2025-09-07T06:57:17.7685684Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCEWithLogitsLoss:0, line 778 <- wrt source file 2025-09-07T06:57:17.7694413Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCEWithLogitsLoss:0 2025-09-07T06:57:17.7696099Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCEWithLogitsLoss:1, line 826 <- wrt source file 2025-09-07T06:57:17.7701359Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCEWithLogitsLoss:1 2025-09-07T06:57:17.7703145Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiLabelMarginLoss:0, line 974 <- wrt source file 2025-09-07T06:57:17.7710319Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiLabelMarginLoss:0 2025-09-07T06:57:17.7712455Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::CrossEntropyLoss:0, line 1306 <- wrt source file 2025-09-07T06:57:17.7721904Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::CrossEntropyLoss:0 2025-09-07T06:57:17.7723389Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::CrossEntropyLoss:1, line 1333 <- wrt source file 2025-09-07T06:57:17.7725106Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::CrossEntropyLoss:1 2025-09-07T06:57:17.7726596Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::CosineEmbeddingLoss:0, line 1495 <- wrt source file 2025-09-07T06:57:17.7734584Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::CosineEmbeddingLoss:0 2025-09-07T06:57:17.7735927Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::MarginRankingLoss:0, line 1562 <- wrt source file 2025-09-07T06:57:17.7742228Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::MarginRankingLoss:0 2025-09-07T06:57:17.7743628Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiMarginLoss:0, line 1643 <- wrt source file 2025-09-07T06:57:17.7750982Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiMarginLoss:0 2025-09-07T06:57:17.7752328Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginLoss:0, line 1745 <- wrt source file 2025-09-07T06:57:17.7762905Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginLoss:0 2025-09-07T06:57:17.7764342Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginWithDistanceLoss:0, line 1858 <- wrt source file 2025-09-07T06:57:17.7784414Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginWithDistanceLoss:0 2025-09-07T06:57:17.7785916Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::CTCLoss:0, line 1990 <- wrt source file 2025-09-07T06:57:17.7810151Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/loss.py::CTCLoss:0 2025-09-07T06:57:17.7811593Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm1d:0, line 332 <- wrt source file 2025-09-07T06:57:17.7819578Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm1d:0 2025-09-07T06:57:17.7821050Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm2d:0, line 443 <- wrt source file 2025-09-07T06:57:17.8064062Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm2d:0 2025-09-07T06:57:17.8065535Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm3d:0, line 554 <- wrt source file 2025-09-07T06:57:18.0180275Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm3d:0 2025-09-07T06:57:18.0496461Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm:0, line 678 <- wrt source file 2025-09-07T06:57:18.0498440Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm:0 2025-09-07T06:57:18.0500605Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm.convert_sync_batchnorm:0, line 844 <- wrt source file 2025-09-07T06:57:18.0503130Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm.convert_sync_batchnorm:0 2025-09-07T06:57:18.0504706Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/channelshuffle.py::ChannelShuffle:0, line 21 <- wrt source file 2025-09-07T06:57:18.0525747Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/channelshuffle.py::ChannelShuffle:0 2025-09-07T06:57:18.0527081Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py::Identity:0, line 34 <- wrt source file 2025-09-07T06:57:18.0532229Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py::Identity:0 2025-09-07T06:57:18.0533477Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py::Linear:0, line 83 <- wrt source file 2025-09-07T06:57:18.0541789Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py::Linear:0 2025-09-07T06:57:18.0543392Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py::Bilinear:0, line 191 <- wrt source file 2025-09-07T06:57:18.0581033Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py::Bilinear:0 2025-09-07T06:57:18.0582555Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout:0, line 60 <- wrt source file 2025-09-07T06:57:18.0586608Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout:0 2025-09-07T06:57:18.0588002Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout1d:0, line 108 <- wrt source file 2025-09-07T06:57:18.0592019Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout1d:0 2025-09-07T06:57:18.0593404Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout2d:0, line 163 <- wrt source file 2025-09-07T06:57:18.0612601Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout2d:0 2025-09-07T06:57:18.0614040Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout3d:0, line 211 <- wrt source file 2025-09-07T06:57:18.0752266Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout3d:0 2025-09-07T06:57:18.0754035Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::AlphaDropout:0, line 257 <- wrt source file 2025-09-07T06:57:18.0757387Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::AlphaDropout:0 2025-09-07T06:57:18.0759279Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::FeatureAlphaDropout:0, line 309 <- wrt source file 2025-09-07T06:57:18.0894227Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/dropout.py::FeatureAlphaDropout:0 2025-09-07T06:57:18.0896209Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool1d:0, line 129 <- wrt source file 2025-09-07T06:57:18.0901093Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool1d:0 2025-09-07T06:57:18.0903998Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool2d:0, line 207 <- wrt source file 2025-09-07T06:57:18.0945001Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool2d:0 2025-09-07T06:57:18.0946410Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool3d:0, line 291 <- wrt source file 2025-09-07T06:57:18.2524660Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool3d:0 2025-09-07T06:57:18.2620085Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool1d:0, line 366 <- wrt source file 2025-09-07T06:57:18.2635385Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool1d:0 2025-09-07T06:57:18.2636868Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool2d:0, line 452 <- wrt source file 2025-09-07T06:57:18.2659155Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool2d:0 2025-09-07T06:57:18.2660619Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool3d:0, line 550 <- wrt source file 2025-09-07T06:57:18.3108151Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool3d:0 2025-09-07T06:57:18.3109670Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool1d:0, line 642 <- wrt source file 2025-09-07T06:57:18.3117243Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool1d:0 2025-09-07T06:57:18.3118814Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool2d:0, line 738 <- wrt source file 2025-09-07T06:57:18.3171152Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool2d:0 2025-09-07T06:57:18.3172580Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool3d:0, line 855 <- wrt source file 2025-09-07T06:57:18.4725922Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool3d:0 2025-09-07T06:57:18.4815626Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool2d:0, line 946 <- wrt source file 2025-09-07T06:57:18.4880213Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool2d:0 2025-09-07T06:57:18.4881840Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool3d:0, line 1033 <- wrt source file 2025-09-07T06:57:18.5713695Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool3d:0 2025-09-07T06:57:18.5715412Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool1d:0, line 1152 <- wrt source file 2025-09-07T06:57:18.5722195Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool1d:0 2025-09-07T06:57:18.5723643Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool2d:0, line 1204 <- wrt source file 2025-09-07T06:57:18.5783059Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool2d:0 2025-09-07T06:57:18.5784549Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool3d:0, line 1264 <- wrt source file 2025-09-07T06:57:18.7480383Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool3d:0 2025-09-07T06:57:18.7572601Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool1d:0, line 1320 <- wrt source file 2025-09-07T06:57:18.7579297Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool1d:0 2025-09-07T06:57:18.7581253Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool2d:0, line 1355 <- wrt source file 2025-09-07T06:57:18.7589271Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool2d:0 2025-09-07T06:57:18.7590883Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool3d:0, line 1399 <- wrt source file 2025-09-07T06:57:18.7606348Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool3d:0 2025-09-07T06:57:18.7607928Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool1d:0, line 1447 <- wrt source file 2025-09-07T06:57:18.7612559Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool1d:0 2025-09-07T06:57:18.7614320Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool2d:0, line 1481 <- wrt source file 2025-09-07T06:57:18.7621777Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool2d:0 2025-09-07T06:57:18.7623436Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool3d:0, line 1521 <- wrt source file 2025-09-07T06:57:18.7642743Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool3d:0 2025-09-07T06:57:18.7644247Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Threshold:0, line 72 <- wrt source file 2025-09-07T06:57:18.7647407Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Threshold:0 2025-09-07T06:57:18.7648845Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU:0, line 120 <- wrt source file 2025-09-07T06:57:18.7654265Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU:0 2025-09-07T06:57:18.7655501Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::RReLU:0, line 185 <- wrt source file 2025-09-07T06:57:18.7658881Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::RReLU:0 2025-09-07T06:57:18.7660095Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardtanh:0, line 247 <- wrt source file 2025-09-07T06:57:18.7663525Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardtanh:0 2025-09-07T06:57:18.7664753Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU6:0, line 318 <- wrt source file 2025-09-07T06:57:18.7667640Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU6:0 2025-09-07T06:57:18.7668851Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Sigmoid:0, line 349 <- wrt source file 2025-09-07T06:57:18.7671685Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Sigmoid:0 2025-09-07T06:57:18.7672937Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardsigmoid:0, line 384 <- wrt source file 2025-09-07T06:57:18.7675543Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardsigmoid:0 2025-09-07T06:57:18.7676624Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanh:0, line 420 <- wrt source file 2025-09-07T06:57:18.7679639Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanh:0 2025-09-07T06:57:18.7680672Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::SiLU:0, line 456 <- wrt source file 2025-09-07T06:57:18.7683607Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::SiLU:0 2025-09-07T06:57:18.7684654Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Mish:0, line 501 <- wrt source file 2025-09-07T06:57:18.7687946Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Mish:0 2025-09-07T06:57:18.7689038Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardswish:0, line 552 <- wrt source file 2025-09-07T06:57:18.7692055Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardswish:0 2025-09-07T06:57:18.7693275Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::ELU:0, line 598 <- wrt source file 2025-09-07T06:57:18.7696701Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::ELU:0 2025-09-07T06:57:18.7697697Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::CELU:0, line 646 <- wrt source file 2025-09-07T06:57:18.7700742Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::CELU:0 2025-09-07T06:57:18.7701754Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::SELU:0, line 705 <- wrt source file 2025-09-07T06:57:18.7706142Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::SELU:0 2025-09-07T06:57:18.7707155Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::GLU:0, line 751 <- wrt source file 2025-09-07T06:57:18.7709504Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::GLU:0 2025-09-07T06:57:18.7710991Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::GELU:0, line 799 <- wrt source file 2025-09-07T06:57:18.7736470Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::GELU:0 2025-09-07T06:57:18.7737974Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardshrink:0, line 848 <- wrt source file 2025-09-07T06:57:18.7740841Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardshrink:0 2025-09-07T06:57:18.7742536Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::LeakyReLU:0, line 903 <- wrt source file 2025-09-07T06:57:18.7745152Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::LeakyReLU:0 2025-09-07T06:57:18.7746635Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSigmoid:0, line 945 <- wrt source file 2025-09-07T06:57:18.7749858Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSigmoid:0 2025-09-07T06:57:18.7751331Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softplus:0, line 981 <- wrt source file 2025-09-07T06:57:18.7754543Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softplus:0 2025-09-07T06:57:18.7756079Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softshrink:0, line 1030 <- wrt source file 2025-09-07T06:57:18.7758663Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softshrink:0 2025-09-07T06:57:18.7760205Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::MultiheadAttention:0, line 1144 <- wrt source file 2025-09-07T06:57:18.7761823Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::MultiheadAttention:0 2025-09-07T06:57:18.7763778Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::PReLU:0, line 1609 <- wrt source file 2025-09-07T06:57:18.7765118Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::PReLU:0 2025-09-07T06:57:18.7766515Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softsign:0, line 1660 <- wrt source file 2025-09-07T06:57:18.7768173Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softsign:0 2025-09-07T06:57:18.7769505Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanhshrink:0, line 1686 <- wrt source file 2025-09-07T06:57:18.7772604Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanhshrink:0 2025-09-07T06:57:18.7773994Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmin:0, line 1724 <- wrt source file 2025-09-07T06:57:18.7777294Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmin:0 2025-09-07T06:57:18.7778342Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax:0, line 1788 <- wrt source file 2025-09-07T06:57:18.7782082Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax:0 2025-09-07T06:57:18.7783135Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax2d:0, line 1835 <- wrt source file 2025-09-07T06:57:18.7786378Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax2d:0 2025-09-07T06:57:18.7787452Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSoftmax:0, line 1874 <- wrt source file 2025-09-07T06:57:18.7791051Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSoftmax:0 2025-09-07T06:57:18.7792291Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.register_buffer:0, line 551 <- wrt source file 2025-09-07T06:57:18.7793441Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.register_buffer:0 2025-09-07T06:57:18.7794537Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.apply:0, line 1039 <- wrt source file 2025-09-07T06:57:18.7809830Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.apply:0 2025-09-07T06:57:18.7810884Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.to:0, line 1290 <- wrt source file 2025-09-07T06:57:18.7819463Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.to:0 2025-09-07T06:57:18.7820992Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.state_dict:0, line 2229 <- wrt source file 2025-09-07T06:57:18.7822405Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.state_dict:0 2025-09-07T06:57:18.7823596Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.parameters:0, line 2670 <- wrt source file 2025-09-07T06:57:18.7824875Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.parameters:0 2025-09-07T06:57:18.7826314Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_parameters:0, line 2698 <- wrt source file 2025-09-07T06:57:18.7827966Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_parameters:0 2025-09-07T06:57:18.7829339Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.buffers:0, line 2725 <- wrt source file 2025-09-07T06:57:18.7830680Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.buffers:0 2025-09-07T06:57:18.7831922Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_buffers:0, line 2752 <- wrt source file 2025-09-07T06:57:18.7833217Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_buffers:0 2025-09-07T06:57:18.7834695Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_children:0, line 2783 <- wrt source file 2025-09-07T06:57:18.7835789Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_children:0 2025-09-07T06:57:18.7836808Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.modules:0, line 2807 <- wrt source file 2025-09-07T06:57:18.7837827Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.modules:0 2025-09-07T06:57:18.7838829Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_modules:0, line 2845 <- wrt source file 2025-09-07T06:57:18.7839875Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_modules:0 2025-09-07T06:57:18.7841182Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::CircularPad1d:0, line 70 <- wrt source file 2025-09-07T06:57:18.7842213Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::CircularPad1d:0 2025-09-07T06:57:18.7843314Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::CircularPad2d:0, line 122 <- wrt source file 2025-09-07T06:57:18.7865518Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::CircularPad2d:0 2025-09-07T06:57:18.7867093Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::CircularPad3d:0, line 187 <- wrt source file 2025-09-07T06:57:19.3831974Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::CircularPad3d:0 2025-09-07T06:57:19.4429310Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad1d:0, line 241 <- wrt source file 2025-09-07T06:57:19.4442070Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad1d:0 2025-09-07T06:57:19.4443114Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad2d:0, line 294 <- wrt source file 2025-09-07T06:57:19.4449429Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad2d:0 2025-09-07T06:57:19.4450439Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad3d:0, line 350 <- wrt source file 2025-09-07T06:57:19.4476717Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad3d:0 2025-09-07T06:57:19.4478640Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad1d:0, line 395 <- wrt source file 2025-09-07T06:57:19.4488593Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad1d:0 2025-09-07T06:57:19.4490331Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad2d:0, line 439 <- wrt source file 2025-09-07T06:57:19.4657080Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad2d:0 2025-09-07T06:57:19.4659035Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad3d:0, line 497 <- wrt source file 2025-09-07T06:57:19.4663666Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad3d:0 2025-09-07T06:57:19.4665580Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad1d:0, line 556 <- wrt source file 2025-09-07T06:57:19.4671401Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad1d:0 2025-09-07T06:57:19.4672934Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad2d:0, line 600 <- wrt source file 2025-09-07T06:57:19.4679209Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad2d:0 2025-09-07T06:57:19.4680580Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad3d:0, line 658 <- wrt source file 2025-09-07T06:57:19.9445064Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad3d:0 2025-09-07T06:57:20.0028529Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ZeroPad1d:0, line 692 <- wrt source file 2025-09-07T06:57:20.0040920Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ZeroPad1d:0 2025-09-07T06:57:20.0042360Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ZeroPad2d:0, line 750 <- wrt source file 2025-09-07T06:57:20.0047429Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ZeroPad2d:0 2025-09-07T06:57:20.0048510Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ZeroPad3d:0, line 812 <- wrt source file 2025-09-07T06:57:20.0139979Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/padding.py::ZeroPad3d:0 2025-09-07T06:57:20.0141473Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/lazy.py::LazyModuleMixin:0, line 77 <- wrt source file 2025-09-07T06:57:20.0143765Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/lazy.py::LazyModuleMixin:0 2025-09-07T06:57:20.0145228Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::Upsample:0, line 77 <- wrt source file 2025-09-07T06:57:20.0171345Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::Upsample:0 2025-09-07T06:57:20.0172910Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingNearest2d:0, line 229 <- wrt source file 2025-09-07T06:57:20.0184887Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingNearest2d:0 2025-09-07T06:57:20.0186706Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingBilinear2d:0, line 279 <- wrt source file 2025-09-07T06:57:20.0194090Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingBilinear2d:0 2025-09-07T06:57:20.0195685Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/fold.py::Fold:0, line 224 <- wrt source file 2025-09-07T06:57:20.0199748Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/fold.py::Fold:0 2025-09-07T06:57:20.0201094Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/fold.py::Unfold:0, line 395 <- wrt source file 2025-09-07T06:57:20.0224283Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/fold.py::Unfold:0 2025-09-07T06:57:20.0225750Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer:0, line 90 <- wrt source file 2025-09-07T06:57:21.1618764Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer:0 2025-09-07T06:57:21.1632009Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer.forward:0, line 258 <- wrt source file 2025-09-07T06:57:21.1633734Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer.forward:0 2025-09-07T06:57:21.1635367Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoder:0, line 336 <- wrt source file 2025-09-07T06:57:21.3498604Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoder:0 2025-09-07T06:57:21.3503828Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoder:0, line 562 <- wrt source file 2025-09-07T06:57:21.6260020Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoder:0 2025-09-07T06:57:21.6267437Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoderLayer:0, line 686 <- wrt source file 2025-09-07T06:57:21.6597373Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoderLayer:0 2025-09-07T06:57:21.6786410Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoderLayer:0, line 995 <- wrt source file 2025-09-07T06:57:21.7744920Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoderLayer:0 2025-09-07T06:57:21.7746579Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNN:0, line 595 <- wrt source file 2025-09-07T06:57:21.7763178Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNN:0 2025-09-07T06:57:21.7764560Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTM:0, line 953 <- wrt source file 2025-09-07T06:57:21.7958736Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTM:0 2025-09-07T06:57:21.7960361Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRU:0, line 1288 <- wrt source file 2025-09-07T06:57:21.7978695Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRU:0 2025-09-07T06:57:21.7980073Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNNCell:0, line 1537 <- wrt source file 2025-09-07T06:57:21.7992632Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNNCell:0 2025-09-07T06:57:21.7993991Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTMCell:0, line 1659 <- wrt source file 2025-09-07T06:57:21.8004508Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTMCell:0 2025-09-07T06:57:21.8006040Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRUCell:0, line 1773 <- wrt source file 2025-09-07T06:57:21.8020740Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRUCell:0 2025-09-07T06:57:21.8022192Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/distance.py::PairwiseDistance:0, line 38 <- wrt source file 2025-09-07T06:57:21.8028040Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/distance.py::PairwiseDistance:0 2025-09-07T06:57:21.8029555Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/distance.py::CosineSimilarity:0, line 81 <- wrt source file 2025-09-07T06:57:21.8035273Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/distance.py::CosineSimilarity:0 2025-09-07T06:57:21.8036731Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Flatten:0, line 30 <- wrt source file 2025-09-07T06:57:21.8042523Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Flatten:0 2025-09-07T06:57:21.8043780Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Unflatten:0, line 87 <- wrt source file 2025-09-07T06:57:21.8058913Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Unflatten:0 2025-09-07T06:57:21.8060242Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelShuffle:0, line 40 <- wrt source file 2025-09-07T06:57:21.8065740Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelShuffle:0 2025-09-07T06:57:21.8067323Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelUnshuffle:0, line 99 <- wrt source file 2025-09-07T06:57:21.8072369Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelUnshuffle:0 2025-09-07T06:57:21.8073711Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential:0, line 81 <- wrt source file 2025-09-07T06:57:21.8074942Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential:0 2025-09-07T06:57:21.8076191Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential.append:0, line 260 <- wrt source file 2025-09-07T06:57:21.8083040Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential.append:0 2025-09-07T06:57:21.8084313Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential.insert:0, line 283 <- wrt source file 2025-09-07T06:57:21.8090700Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential.insert:0 2025-09-07T06:57:21.8091917Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential.extend:0, line 314 <- wrt source file 2025-09-07T06:57:21.8100610Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential.extend:0 2025-09-07T06:57:21.8102140Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleList:0, line 343 <- wrt source file 2025-09-07T06:57:21.8103704Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleList:0 2025-09-07T06:57:21.8105037Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleDict:0, line 523 <- wrt source file 2025-09-07T06:57:21.8106279Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleDict:0 2025-09-07T06:57:21.8107504Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterList:0, line 653 <- wrt source file 2025-09-07T06:57:21.8108774Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterList:0 2025-09-07T06:57:21.8110016Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterDict:0, line 808 <- wrt source file 2025-09-07T06:57:21.8111280Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterDict:0 2025-09-07T06:57:21.8112404Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm1d:0, line 187 <- wrt source file 2025-09-07T06:57:21.8118619Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm1d:0 2025-09-07T06:57:21.8119779Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm2d:0, line 303 <- wrt source file 2025-09-07T06:57:21.8393305Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm2d:0 2025-09-07T06:57:21.8394990Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm3d:0, line 419 <- wrt source file 2025-09-07T06:57:21.9873308Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm3d:0 2025-09-07T06:57:22.0015620Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LocalResponseNorm:0, line 38 <- wrt source file 2025-09-07T06:57:22.0067867Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LocalResponseNorm:0 2025-09-07T06:57:22.0069439Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LayerNorm:0, line 163 <- wrt source file 2025-09-07T06:57:22.0079137Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LayerNorm:0 2025-09-07T06:57:22.0080638Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/normalization.py::GroupNorm:0, line 274 <- wrt source file 2025-09-07T06:57:22.0088247Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/normalization.py::GroupNorm:0 2025-09-07T06:57:22.0089739Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/normalization.py::RMSNorm:0, line 367 <- wrt source file 2025-09-07T06:57:22.0094651Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/normalization.py::RMSNorm:0 2025-09-07T06:57:22.0096192Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/bias.py::CausalBias:0, line 95 <- wrt source file 2025-09-07T06:57:22.0097758Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/bias.py::CausalBias:0 2025-09-07T06:57:22.0099095Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/masked/_ops.py::logaddexp:0, line 1530 <- wrt source file 2025-09-07T06:57:22.0117682Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/masked/_ops.py::logaddexp:0 2025-09-07T06:57:22.0119338Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/masked/maskedtensor/core.py::is_masked_tensor:0, line 25 <- wrt source file 2025-09-07T06:57:22.0120883Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/masked/maskedtensor/core.py::is_masked_tensor:0 2025-09-07T06:57:22.0122111Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_prims/context.py::TorchRefsMode:0, line 95 <- wrt source file 2025-09-07T06:57:22.0123299Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_prims/context.py::TorchRefsMode:0 2025-09-07T06:57:22.0124874Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::_coalescing_manager:0, line 2573 <- wrt source file 2025-09-07T06:57:22.0126572Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::_coalescing_manager:0 2025-09-07T06:57:22.0128364Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::_time_estimator:0, line 2675 <- wrt source file 2025-09-07T06:57:22.0130083Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::_time_estimator:0 2025-09-07T06:57:22.0131544Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::batch_isend_irecv:0, line 2722 <- wrt source file 2025-09-07T06:57:22.0133041Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::batch_isend_irecv:0 2025-09-07T06:57:22.0135387Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_reduce:0, line 2859 <- wrt source file 2025-09-07T06:57:22.0136788Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_reduce:0 2025-09-07T06:57:22.0138094Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_object:0, line 3146 <- wrt source file 2025-09-07T06:57:22.0139373Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_object:0 2025-09-07T06:57:22.0140616Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::gather_object:0, line 3250 <- wrt source file 2025-09-07T06:57:22.0141728Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::gather_object:0 2025-09-07T06:57:22.0142810Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::send_object_list:0, line 3380 <- wrt source file 2025-09-07T06:57:22.0143906Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::send_object_list:0 2025-09-07T06:57:22.0144978Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::recv_object_list:0, line 3497 <- wrt source file 2025-09-07T06:57:22.0146219Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::recv_object_list:0 2025-09-07T06:57:22.0147447Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::broadcast_object_list:0, line 3643 <- wrt source file 2025-09-07T06:57:22.0148600Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::broadcast_object_list:0 2025-09-07T06:57:22.0149795Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter_object_list:0, line 3766 <- wrt source file 2025-09-07T06:57:22.0151003Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter_object_list:0 2025-09-07T06:57:22.0152256Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather:0, line 3868 <- wrt source file 2025-09-07T06:57:22.0153364Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather:0 2025-09-07T06:57:22.0154450Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_into_tensor:0, line 3975 <- wrt source file 2025-09-07T06:57:22.0155645Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_into_tensor:0 2025-09-07T06:57:22.0156769Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_coalesced:0, line 4113 <- wrt source file 2025-09-07T06:57:22.0157895Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_coalesced:0 2025-09-07T06:57:22.0158948Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::gather:0, line 4219 <- wrt source file 2025-09-07T06:57:22.0160017Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::gather:0 2025-09-07T06:57:22.0161031Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter:0, line 4304 <- wrt source file 2025-09-07T06:57:22.0162180Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter:0 2025-09-07T06:57:22.0163274Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::reduce_scatter_tensor:0, line 4442 <- wrt source file 2025-09-07T06:57:22.0164423Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::reduce_scatter_tensor:0 2025-09-07T06:57:22.0165523Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all_single:0, line 4584 <- wrt source file 2025-09-07T06:57:22.0166621Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all_single:0 2025-09-07T06:57:22.0167668Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all:0, line 4718 <- wrt source file 2025-09-07T06:57:22.0168711Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all:0 2025-09-07T06:57:22.0169778Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::monitored_barrier:0, line 4926 <- wrt source file 2025-09-07T06:57:22.0170884Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::monitored_barrier:0 2025-09-07T06:57:22.0172059Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups:0, line 5468 <- wrt source file 2025-09-07T06:57:22.0173159Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups:0 2025-09-07T06:57:22.0174424Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups_by_enumeration:0, line 5562 <- wrt source file 2025-09-07T06:57:22.0175697Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups_by_enumeration:0 2025-09-07T06:57:22.0176717Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/run.py::__doc__:0, line 57 <- wrt source file 2025-09-07T06:57:22.0177623Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/run.py::__doc__:0 2025-09-07T06:57:22.0178519Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/launch.py::__doc__:0, line 84 <- wrt source file 2025-09-07T06:57:22.0179440Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/launch.py::__doc__:0 2025-09-07T06:57:22.0180390Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_distributed_c10d.py::__doc__:0, line 11 <- wrt source file 2025-09-07T06:57:22.0181400Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_distributed_c10d.py::__doc__:0 2025-09-07T06:57:22.0182387Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/device_mesh.py::DeviceMesh:0, line 410 <- wrt source file 2025-09-07T06:57:22.0183386Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/device_mesh.py::DeviceMesh:0 2025-09-07T06:57:22.0184433Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/device_mesh.py::DeviceMesh.get_local_rank:0, line 955 <- wrt source file 2025-09-07T06:57:22.0185552Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/device_mesh.py::DeviceMesh.get_local_rank:0 2025-09-07T06:57:22.0186710Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/device_mesh.py::init_device_mesh:0, line 1101 <- wrt source file 2025-09-07T06:57:22.0187773Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/device_mesh.py::init_device_mesh:0 2025-09-07T06:57:22.0188782Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/autograd/__init__.py::context:0, line 47 <- wrt source file 2025-09-07T06:57:22.0189805Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/autograd/__init__.py::context:0 2025-09-07T06:57:22.0190849Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_tools/memory_tracker.py::MemoryTracker:0, line 55 <- wrt source file 2025-09-07T06:57:22.0191955Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_tools/memory_tracker.py::MemoryTracker:0 2025-09-07T06:57:22.0193023Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/nn/functional.py::_all_gather_base:0, line 130 <- wrt source file 2025-09-07T06:57:22.0194084Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/nn/functional.py::_all_gather_base:0 2025-09-07T06:57:22.0195189Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.__init__:0, line 196 <- wrt source file 2025-09-07T06:57:22.0196358Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.__init__:0 2025-09-07T06:57:22.0197715Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.init_from_module_rref:0, line 520 <- wrt source file 2025-09-07T06:57:22.0199051Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.init_from_module_rref:0 2025-09-07T06:57:22.0200289Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::RemoteModule:0, line 643 <- wrt source file 2025-09-07T06:57:22.0201399Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::RemoteModule:0 2025-09-07T06:57:22.0202567Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/options.py::TensorPipeRpcBackendOptions.set_device_map:0, line 125 <- wrt source file 2025-09-07T06:57:22.0203856Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/options.py::TensorPipeRpcBackendOptions.set_device_map:0 2025-09-07T06:57:22.0204949Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/api.py::_wait_all:0, line 174 <- wrt source file 2025-09-07T06:57:22.0205912Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/api.py::_wait_all:0 2025-09-07T06:57:22.0206858Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/api.py::shutdown:0, line 345 <- wrt source file 2025-09-07T06:57:22.0207813Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/api.py::shutdown:0 2025-09-07T06:57:22.0208730Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/api.py::remote:0, line 606 <- wrt source file 2025-09-07T06:57:22.0209681Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/api.py::remote:0 2025-09-07T06:57:22.0210663Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_sync:0, line 786 <- wrt source file 2025-09-07T06:57:22.0211617Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_sync:0 2025-09-07T06:57:22.0212543Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_async:0, line 878 <- wrt source file 2025-09-07T06:57:22.0213487Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_async:0 2025-09-07T06:57:22.0214535Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/functions.py::async_execution:0, line 34 <- wrt source file 2025-09-07T06:57:22.0215597Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/functions.py::async_execution:0 2025-09-07T06:57:22.0216782Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/server_process_global_profiler.py::_server_process_global_profile:0, line 60 <- wrt source file 2025-09-07T06:57:22.0218130Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/rpc/server_process_global_profiler.py::_server_process_global_profile:0 2025-09-07T06:57:22.0219321Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py::save:0, line 159 <- wrt source file 2025-09-07T06:57:22.0220410Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py::save:0 2025-09-07T06:57:22.0221600Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py::async_save:0, line 273 <- wrt source file 2025-09-07T06:57:22.0222739Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py::async_save:0 2025-09-07T06:57:22.0224020Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/format_utils.py::BroadcastingTorchSaveReader:0, line 49 <- wrt source file 2025-09-07T06:57:22.0225385Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/format_utils.py::BroadcastingTorchSaveReader:0 2025-09-07T06:57:22.0226606Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/format_utils.py::DynamicMetaLoadPlanner:0, line 161 <- wrt source file 2025-09-07T06:57:22.0227837Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/format_utils.py::DynamicMetaLoadPlanner:0 2025-09-07T06:57:22.0229057Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/optimizer.py::load_sharded_optimizer_state_dict:0, line 225 <- wrt source file 2025-09-07T06:57:22.0230331Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/optimizer.py::load_sharded_optimizer_state_dict:0 2025-09-07T06:57:22.0231502Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_loader.py::load:0, line 131 <- wrt source file 2025-09-07T06:57:22.0232602Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_loader.py::load:0 2025-09-07T06:57:22.0233693Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict.py::get_state_dict:0, line 1144 <- wrt source file 2025-09-07T06:57:22.0234819Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict.py::get_state_dict:0 2025-09-07T06:57:22.0236035Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict.py::_patch_model_state_dict:0, line 1395 <- wrt source file 2025-09-07T06:57:22.0237232Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict.py::_patch_model_state_dict:0 2025-09-07T06:57:22.0238414Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict.py::_patch_optimizer_state_dict:0, line 1454 <- wrt source file 2025-09-07T06:57:22.0239636Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict.py::_patch_optimizer_state_dict:0 2025-09-07T06:57:22.0240860Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/staging.py::DefaultStager.close:0, line 206 <- wrt source file 2025-09-07T06:57:22.0242121Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/staging.py::DefaultStager.close:0 2025-09-07T06:57:22.0243358Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/builder.py::make_sync_checkpointer:0, line 77 <- wrt source file 2025-09-07T06:57:22.0244631Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/builder.py::make_sync_checkpointer:0 2025-09-07T06:57:22.0245888Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/builder.py::make_async_checkpointer:0, line 138 <- wrt source file 2025-09-07T06:57:22.0247242Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/builder.py::make_async_checkpointer:0 2025-09-07T06:57:22.0248467Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/barriers.py::BarrierConfig:0, line 50 <- wrt source file 2025-09-07T06:57:22.0249768Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/barriers.py::BarrierConfig:0 2025-09-07T06:57:22.0251060Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/checkpointer.py::SyncCheckpointer:0, line 104 <- wrt source file 2025-09-07T06:57:22.0252341Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/checkpointer.py::SyncCheckpointer:0 2025-09-07T06:57:22.0253628Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/checkpointer.py::SyncCheckpointer.save:0, line 142 <- wrt source file 2025-09-07T06:57:22.0255023Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/checkpointer.py::SyncCheckpointer.save:0 2025-09-07T06:57:22.0256317Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/checkpointer.py::AsyncCheckpointer:0, line 213 <- wrt source file 2025-09-07T06:57:22.0257618Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/checkpointer.py::AsyncCheckpointer:0 2025-09-07T06:57:22.0258904Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/checkpointer.py::AsyncCheckpointer.save:0, line 260 <- wrt source file 2025-09-07T06:57:22.0260245Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/_experimental/checkpointer.py::AsyncCheckpointer.save:0 2025-09-07T06:57:22.0261478Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/pipelining/_IR.py::pipe_split:0, line 333 <- wrt source file 2025-09-07T06:57:22.0262519Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/pipelining/_IR.py::pipe_split:0 2025-09-07T06:57:22.0263594Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/pipelining/microbatch.py::_CustomReducer:0, line 34 <- wrt source file 2025-09-07T06:57:22.0264738Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/pipelining/microbatch.py::_CustomReducer:0 2025-09-07T06:57:22.0265905Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/pipelining/microbatch.py::TensorChunkSpec.from_tuple:0, line 83 <- wrt source file 2025-09-07T06:57:22.0267147Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/pipelining/microbatch.py::TensorChunkSpec.from_tuple:0 2025-09-07T06:57:22.0268360Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/pipelining/microbatch.py::TensorChunkSpec.from_dict:0, line 102 <- wrt source file 2025-09-07T06:57:22.0269592Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/pipelining/microbatch.py::TensorChunkSpec.from_dict:0 2025-09-07T06:57:22.0270776Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/sharded_grad_scaler.py::ShardedGradScaler:0, line 54 <- wrt source file 2025-09-07T06:57:22.0271947Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/sharded_grad_scaler.py::ShardedGradScaler:0 2025-09-07T06:57:22.0273114Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::MixedPrecision:0, line 202 <- wrt source file 2025-09-07T06:57:22.0274125Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::MixedPrecision:0 2025-09-07T06:57:22.0275210Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::StateDictType:0, line 262 <- wrt source file 2025-09-07T06:57:22.0276301Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::StateDictType:0 2025-09-07T06:57:22.0277440Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel:0, line 125 <- wrt source file 2025-09-07T06:57:22.0278733Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel:0 2025-09-07T06:57:22.0280084Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.set_state_dict_type:0, line 651 <- wrt source file 2025-09-07T06:57:22.0281530Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.set_state_dict_type:0 2025-09-07T06:57:22.0282938Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.state_dict_type:0, line 798 <- wrt source file 2025-09-07T06:57:22.0284349Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.state_dict_type:0 2025-09-07T06:57:22.0285800Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.shard_full_optim_state_dict:0, line 1490 <- wrt source file 2025-09-07T06:57:22.0287408Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.shard_full_optim_state_dict:0 2025-09-07T06:57:22.0288913Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.scatter_full_optim_state_dict:0, line 1610 <- wrt source file 2025-09-07T06:57:22.0290440Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.scatter_full_optim_state_dict:0 2025-09-07T06:57:22.0291908Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.rekey_optim_state_dict:0, line 1695 <- wrt source file 2025-09-07T06:57:22.0293376Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.rekey_optim_state_dict:0 2025-09-07T06:57:22.0294861Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.optim_state_dict:0, line 1824 <- wrt source file 2025-09-07T06:57:22.0296300Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.optim_state_dict:0 2025-09-07T06:57:22.0297725Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.optim_state_dict_to_load:0, line 1911 <- wrt source file 2025-09-07T06:57:22.0299300Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.optim_state_dict_to_load:0 2025-09-07T06:57:22.0300511Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py::CustomPolicy:0, line 224 <- wrt source file 2025-09-07T06:57:22.0301606Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py::CustomPolicy:0 2025-09-07T06:57:22.0302809Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/_random.py::OffsetBasedRNGTracker._set_pre_op_offset:0, line 294 <- wrt source file 2025-09-07T06:57:22.0304061Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/_random.py::OffsetBasedRNGTracker._set_pre_op_offset:0 2025-09-07T06:57:22.0305168Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/_api.py::_shard_tensor:0, line 828 <- wrt source file 2025-09-07T06:57:22.0306186Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/_api.py::_shard_tensor:0 2025-09-07T06:57:22.0307252Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/loss.py::loss_parallel:0, line 56 <- wrt source file 2025-09-07T06:57:22.0308375Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/loss.py::loss_parallel:0 2025-09-07T06:57:22.0309503Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/ddp.py::_pre_dp_module_transform:0, line 88 <- wrt source file 2025-09-07T06:57:22.0310697Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/ddp.py::_pre_dp_module_transform:0 2025-09-07T06:57:22.0311852Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::ColwiseParallel:0, line 64 <- wrt source file 2025-09-07T06:57:22.0313092Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::ColwiseParallel:0 2025-09-07T06:57:22.0314243Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::RowwiseParallel:0, line 198 <- wrt source file 2025-09-07T06:57:22.0315400Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::RowwiseParallel:0 2025-09-07T06:57:22.0316526Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::SequenceParallel:0, line 350 <- wrt source file 2025-09-07T06:57:22.0317684Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::SequenceParallel:0 2025-09-07T06:57:22.0318825Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::PrepareModuleInput:0, line 452 <- wrt source file 2025-09-07T06:57:22.0320009Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::PrepareModuleInput:0 2025-09-07T06:57:22.0321172Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::PrepareModuleOutput:0, line 615 <- wrt source file 2025-09-07T06:57:22.0322355Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::PrepareModuleOutput:0 2025-09-07T06:57:22.0323541Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::PrepareModuleInputOutput:0, line 740 <- wrt source file 2025-09-07T06:57:22.0324861Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/style.py::PrepareModuleInputOutput:0 2025-09-07T06:57:22.0326026Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/api.py::parallelize_module:0, line 56 <- wrt source file 2025-09-07T06:57:22.0327314Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/parallel/api.py::parallelize_module:0 2025-09-07T06:57:22.0328515Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/experimental/_register_sharding.py::register_sharding:0, line 47 <- wrt source file 2025-09-07T06:57:22.0329811Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/experimental/_register_sharding.py::register_sharding:0 2025-09-07T06:57:22.0331005Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/experimental/_func_map.py::local_map:0, line 103 <- wrt source file 2025-09-07T06:57:22.0332164Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/experimental/_func_map.py::local_map:0 2025-09-07T06:57:22.0333337Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/_ops/_common_rules.py::pointwise_rule:0, line 230 <- wrt source file 2025-09-07T06:57:22.0334555Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/tensor/_ops/_common_rules.py::pointwise_rule:0 2025-09-07T06:57:22.0335667Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_composable/contract.py::contract:0, line 66 <- wrt source file 2025-09-07T06:57:22.0336742Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_composable/contract.py::contract:0 2025-09-07T06:57:22.0337854Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_composable/checkpoint_activation.py::checkpoint:0, line 53 <- wrt source file 2025-09-07T06:57:22.0339135Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_composable/checkpoint_activation.py::checkpoint:0 2025-09-07T06:57:22.0340249Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_composable/replicate.py::replicate:0, line 190 <- wrt source file 2025-09-07T06:57:22.0341361Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_composable/replicate.py::replicate:0 2025-09-07T06:57:22.0342452Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_composable/replicate_with_fsdp.py::replicate:0, line 247 <- wrt source file 2025-09-07T06:57:22.0343633Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_composable/replicate_with_fsdp.py::replicate:0 2025-09-07T06:57:22.0344797Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::init_from_local_shards:0, line 384 <- wrt source file 2025-09-07T06:57:22.0346021Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::init_from_local_shards:0 2025-09-07T06:57:22.0347217Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::custom_sharded_op_impl:0, line 457 <- wrt source file 2025-09-07T06:57:22.0348450Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::custom_sharded_op_impl:0 2025-09-07T06:57:22.0349795Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_optim/__init__.py::named_params_with_sharded_tensor:0, line 31 <- wrt source file 2025-09-07T06:57:22.0351100Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_optim/__init__.py::named_params_with_sharded_tensor:0 2025-09-07T06:57:22.0352372Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharding_plan/api.py::ShardingPlan:0, line 36 <- wrt source file 2025-09-07T06:57:22.0353580Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharding_plan/api.py::ShardingPlan:0 2025-09-07T06:57:22.0354781Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor._init_from_local_tensor:0, line 856 <- wrt source file 2025-09-07T06:57:22.0356102Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor._init_from_local_tensor:0 2025-09-07T06:57:22.0357355Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor.reshard:0, line 1094 <- wrt source file 2025-09-07T06:57:22.0358581Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor.reshard:0 2025-09-07T06:57:22.0359784Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/_ops/_common.py::_sharded_op_common:0, line 18 <- wrt source file 2025-09-07T06:57:22.0361013Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/_ops/_common.py::_sharded_op_common:0 2025-09-07T06:57:22.0372341Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/elastic/events/__init__.py::construct_and_record_rdzv_event:0, line 110 <- wrt source file 2025-09-07T06:57:22.0373724Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/elastic/events/__init__.py::construct_and_record_rdzv_event:0 2025-09-07T06:57:22.0375201Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/elastic/utils/distributed.py::get_free_port:0, line 141 <- wrt source file 2025-09-07T06:57:22.0376431Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/elastic/utils/distributed.py::get_free_port:0 2025-09-07T06:57:22.0377654Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/elastic/rendezvous/api.py::RendezvousHandler.shutdown:0, line 231 <- wrt source file 2025-09-07T06:57:22.0378930Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/elastic/rendezvous/api.py::RendezvousHandler.shutdown:0 2025-09-07T06:57:22.0380105Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::put:0, line 142 <- wrt source file 2025-09-07T06:57:22.0381238Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::put:0 2025-09-07T06:57:22.0382351Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::get:0, line 195 <- wrt source file 2025-09-07T06:57:22.0383481Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::get:0 2025-09-07T06:57:22.0384644Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::putmem_signal_block:0, line 268 <- wrt source file 2025-09-07T06:57:22.0385997Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::putmem_signal_block:0 2025-09-07T06:57:22.0387175Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::wait_until:0, line 323 <- wrt source file 2025-09-07T06:57:22.0388447Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::wait_until:0 2025-09-07T06:57:22.0389703Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::signal_wait_until:0, line 386 <- wrt source file 2025-09-07T06:57:22.0390934Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::signal_wait_until:0 2025-09-07T06:57:22.0392106Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::signal_op:0, line 437 <- wrt source file 2025-09-07T06:57:22.0393278Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::signal_op:0 2025-09-07T06:57:22.0394414Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::fence:0, line 490 <- wrt source file 2025-09-07T06:57:22.0395572Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::fence:0 2025-09-07T06:57:22.0396706Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::quiet:0, line 536 <- wrt source file 2025-09-07T06:57:22.0397874Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::quiet:0 2025-09-07T06:57:22.0398998Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::my_pe:0, line 580 <- wrt source file 2025-09-07T06:57:22.0400222Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::my_pe:0 2025-09-07T06:57:22.0401347Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::n_pes:0, line 623 <- wrt source file 2025-09-07T06:57:22.0402485Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::n_pes:0 2025-09-07T06:57:22.0403623Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::barrier_all:0, line 674 <- wrt source file 2025-09-07T06:57:22.0404812Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::barrier_all:0 2025-09-07T06:57:22.0405959Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::sync_all:0, line 720 <- wrt source file 2025-09-07T06:57:22.0407118Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::sync_all:0 2025-09-07T06:57:22.0408264Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::alltoall:0, line 759 <- wrt source file 2025-09-07T06:57:22.0409440Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::alltoall:0 2025-09-07T06:57:22.0410665Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::broadcast:0, line 814 <- wrt source file 2025-09-07T06:57:22.0411845Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::broadcast:0 2025-09-07T06:57:22.0413047Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::reduce:0, line 875 <- wrt source file 2025-09-07T06:57:22.0414323Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::reduce:0 2025-09-07T06:57:22.0415518Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::reduce_extern_wrapper:0, line 921 <- wrt source file 2025-09-07T06:57:22.0416783Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/_symmetric_memory/_nvshmem_triton.py::reduce_extern_wrapper:0 2025-09-07T06:57:22.0417907Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/join.py::Join:0, line 141 <- wrt source file 2025-09-07T06:57:22.0418925Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/join.py::Join:0 2025-09-07T06:57:22.0420058Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/__init__.py::register_ddp_comm_hook:0, line 107 <- wrt source file 2025-09-07T06:57:22.0421346Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/__init__.py::register_ddp_comm_hook:0 2025-09-07T06:57:22.0422635Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py::post_localSGD_hook:0, line 91 <- wrt source file 2025-09-07T06:57:22.0423983Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py::post_localSGD_hook:0 2025-09-07T06:57:22.0425450Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::allreduce_hook:0, line 49 <- wrt source file 2025-09-07T06:57:22.0426750Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::allreduce_hook:0 2025-09-07T06:57:22.0428003Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_hook:0, line 104 <- wrt source file 2025-09-07T06:57:22.0429288Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_hook:0 2025-09-07T06:57:22.0430557Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_hook:0, line 125 <- wrt source file 2025-09-07T06:57:22.0431858Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_hook:0 2025-09-07T06:57:22.0433145Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_wrapper:0, line 143 <- wrt source file 2025-09-07T06:57:22.0434473Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_wrapper:0 2025-09-07T06:57:22.0435795Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_wrapper:0, line 182 <- wrt source file 2025-09-07T06:57:22.0437222Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_wrapper:0 2025-09-07T06:57:22.0438649Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_pertensor_hook:0, line 64 <- wrt source file 2025-09-07T06:57:22.0440156Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_pertensor_hook:0 2025-09-07T06:57:22.0441549Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_perchannel_hook:0, line 145 <- wrt source file 2025-09-07T06:57:22.0442973Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_perchannel_hook:0 2025-09-07T06:57:22.0444287Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::powerSGD_hook:0, line 395 <- wrt source file 2025-09-07T06:57:22.0445558Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::powerSGD_hook:0 2025-09-07T06:57:22.0446826Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::batched_powerSGD_hook:0, line 708 <- wrt source file 2025-09-07T06:57:22.0448146Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::batched_powerSGD_hook:0 2025-09-07T06:57:22.0449402Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py::noop_hook:0, line 23 <- wrt source file 2025-09-07T06:57:22.0450656Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py::noop_hook:0 2025-09-07T06:57:22.0452107Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py::HierarchicalModelAverager:0, line 54 <- wrt source file 2025-09-07T06:57:22.0453639Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py::HierarchicalModelAverager:0 2025-09-07T06:57:22.0455106Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/averagers.py::PeriodicModelAverager:0, line 57 <- wrt source file 2025-09-07T06:57:22.0456441Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/averagers.py::PeriodicModelAverager:0 2025-09-07T06:57:22.0457731Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/apply_optimizer_in_backward.py::_apply_optimizer_in_backward:0, line 43 <- wrt source file 2025-09-07T06:57:22.0459055Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/apply_optimizer_in_backward.py::_apply_optimizer_in_backward:0 2025-09-07T06:57:22.0460345Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/apply_optimizer_in_backward.py::_get_in_backward_optimizers:0, line 114 <- wrt source file 2025-09-07T06:57:22.0461651Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/apply_optimizer_in_backward.py::_get_in_backward_optimizers:0 2025-09-07T06:57:22.0463037Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/zero_redundancy_optimizer.py::ZeroRedundancyOptimizer:0, line 335 <- wrt source file 2025-09-07T06:57:22.0464340Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/zero_redundancy_optimizer.py::ZeroRedundancyOptimizer:0 2025-09-07T06:57:22.0465646Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/named_optimizer.py::_NamedOptimizer:0, line 43 <- wrt source file 2025-09-07T06:57:22.0466872Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/named_optimizer.py::_NamedOptimizer:0 2025-09-07T06:57:22.0468055Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/post_localSGD_optimizer.py::PostLocalSGDOptimizer:0, line 19 <- wrt source file 2025-09-07T06:57:22.0469321Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/post_localSGD_optimizer.py::PostLocalSGDOptimizer:0 2025-09-07T06:57:22.0470515Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/optimizer.py::DistributedOptimizer:0, line 162 <- wrt source file 2025-09-07T06:57:22.0471675Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/optimizer.py::DistributedOptimizer:0 2025-09-07T06:57:22.0472793Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/utils.py::register_functional_optim:0, line 37 <- wrt source file 2025-09-07T06:57:22.0473931Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/optim/utils.py::register_functional_optim:0 2025-09-07T06:57:22.0474948Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/amp/grad_scaler.py::GradScaler:0, line 64 <- wrt source file 2025-09-07T06:57:22.0475903Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/amp/grad_scaler.py::GradScaler:0 2025-09-07T06:57:22.0476824Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/triton.py::triton_op:0, line 136 <- wrt source file 2025-09-07T06:57:22.0477861Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/triton.py::triton_op:0 2025-09-07T06:57:22.0478796Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/triton.py::wrap_triton:0, line 307 <- wrt source file 2025-09-07T06:57:22.0479737Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/triton.py::wrap_triton:0 2025-09-07T06:57:22.0480755Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/fake_impl.py::FakeImplCtx.new_dynamic_size:0, line 175 <- wrt source file 2025-09-07T06:57:22.0775576Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/fake_impl.py::FakeImplCtx.new_dynamic_size:0 2025-09-07T06:57:22.0777392Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::custom_op:0, line 98 <- wrt source file 2025-09-07T06:57:22.1119696Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::custom_op:0 2025-09-07T06:57:22.1121654Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::CustomOpDef.set_kernel_enabled:0, line 238 <- wrt source file 2025-09-07T06:57:22.1213115Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::CustomOpDef.set_kernel_enabled:0 2025-09-07T06:57:22.1215197Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::CustomOpDef.register_kernel:0, line 307 <- wrt source file 2025-09-07T06:57:22.1217578Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::CustomOpDef.register_kernel:0 2025-09-07T06:57:22.1218691Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::CustomOpDef.register_autograd:0, line 541 <- wrt source file 2025-09-07T06:57:22.1396370Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::CustomOpDef.register_autograd:0 2025-09-07T06:57:22.1398494Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::CustomOpDef.register_vmap:0, line 709 <- wrt source file 2025-09-07T06:57:22.1582308Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::CustomOpDef.register_vmap:0 2025-09-07T06:57:22.1584205Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::CustomOpDef.register_autocast:0, line 795 <- wrt source file 2025-09-07T06:57:22.1586132Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py::CustomOpDef.register_autocast:0 2025-09-07T06:57:22.1587913Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/infer_schema.py::infer_schema:0, line 51 <- wrt source file 2025-09-07T06:57:22.1589392Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/infer_schema.py::infer_schema:0 2025-09-07T06:57:22.1590795Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/fake_class_registry.py::register_fake_class:0, line 230 <- wrt source file 2025-09-07T06:57:22.1591947Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/fake_class_registry.py::register_fake_class:0 2025-09-07T06:57:22.1592994Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/package/glob_group.py::GlobGroup:0, line 22 <- wrt source file 2025-09-07T06:57:22.1593974Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/package/glob_group.py::GlobGroup:0 2025-09-07T06:57:22.1595095Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/dynamic_shapes.py::Dim:0, line 103 <- wrt source file 2025-09-07T06:57:22.1596079Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/dynamic_shapes.py::Dim:0 2025-09-07T06:57:22.1597083Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/dynamic_shapes.py::ShapesCollection:0, line 715 <- wrt source file 2025-09-07T06:57:22.1598163Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/dynamic_shapes.py::ShapesCollection:0 2025-09-07T06:57:22.1599192Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/dynamic_shapes.py::ShapesCollection:1, line 731 <- wrt source file 2025-09-07T06:57:22.1600233Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/dynamic_shapes.py::ShapesCollection:1 2025-09-07T06:57:22.1601266Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/dynamic_shapes.py::AdditionalInputs:0, line 815 <- wrt source file 2025-09-07T06:57:22.1602308Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/dynamic_shapes.py::AdditionalInputs:0 2025-09-07T06:57:22.1603334Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/decorators.py::substitute_in_graph:0, line 349 <- wrt source file 2025-09-07T06:57:22.1604366Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/decorators.py::substitute_in_graph:0 2025-09-07T06:57:22.1605547Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/base.py::VariableTracker.python_type:0, line 322 <- wrt source file 2025-09-07T06:57:22.1606694Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/base.py::VariableTracker.python_type:0 2025-09-07T06:57:22.1607926Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/profiler/profiler.py::_KinetoProfile.toggle_collection_dynamic:0, line 295 <- wrt source file 2025-09-07T06:57:22.1609199Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/profiler/profiler.py::_KinetoProfile.toggle_collection_dynamic:0 2025-09-07T06:57:22.1610265Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/profiler/profiler.py::profile:0, line 616 <- wrt source file 2025-09-07T06:57:22.1611205Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/profiler/profiler.py::profile:0 2025-09-07T06:57:22.1612120Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/scan.py::scan:0, line 156 <- wrt source file 2025-09-07T06:57:22.1613057Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/scan.py::scan:0 2025-09-07T06:57:22.1614110Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/scan.py::ScanAutogradOp:0, line 474 <- wrt source file 2025-09-07T06:57:22.1615137Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/scan.py::ScanAutogradOp:0 2025-09-07T06:57:22.1616072Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/map.py::map:0, line 79 <- wrt source file 2025-09-07T06:57:22.1616993Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/map.py::map:0 2025-09-07T06:57:22.1617987Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/flat_apply.py::FlatApply.__call__:0, line 80 <- wrt source file 2025-09-07T06:57:22.1619165Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/flat_apply.py::FlatApply.__call__:0 2025-09-07T06:57:22.1620257Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/associative_scan.py::associative_scan:0, line 186 <- wrt source file 2025-09-07T06:57:22.1621379Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/associative_scan.py::associative_scan:0 2025-09-07T06:57:22.1622520Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/associative_scan.py::generic_associative_scan:0, line 322 <- wrt source file 2025-09-07T06:57:22.1623698Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/associative_scan.py::generic_associative_scan:0 2025-09-07T06:57:22.1624719Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/cond.py::cond:0, line 155 <- wrt source file 2025-09-07T06:57:22.1625654Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_higher_order_ops/cond.py::cond:0 2025-09-07T06:57:22.1626658Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/semi_structured.py::to_sparse_semi_structured:0, line 339 <- wrt source file 2025-09-07T06:57:22.1627750Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/semi_structured.py::to_sparse_semi_structured:0 2025-09-07T06:57:22.1628790Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/subgraph_rewriter.py::replace_pattern:0, line 125 <- wrt source file 2025-09-07T06:57:22.1629972Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/subgraph_rewriter.py::replace_pattern:0 2025-09-07T06:57:22.1630928Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py::Interpreter:0, line 49 <- wrt source file 2025-09-07T06:57:22.1631947Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py::Interpreter:0 2025-09-07T06:57:22.1632942Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py::Transformer:0, line 480 <- wrt source file 2025-09-07T06:57:22.1633871Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py::Transformer:0 2025-09-07T06:57:22.1634768Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/tensor_type.py::TensorType:0, line 12 <- wrt source file 2025-09-07T06:57:22.1635674Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/tensor_type.py::TensorType:0 2025-09-07T06:57:22.1636574Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_consistent:0, line 65 <- wrt source file 2025-09-07T06:57:22.1637504Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_consistent:0 2025-09-07T06:57:22.1638420Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_more_precise:0, line 93 <- wrt source file 2025-09-07T06:57:22.1639360Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_more_precise:0 2025-09-07T06:57:22.1640250Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/graph.py::_snake_case:0, line 102 <- wrt source file 2025-09-07T06:57:22.1641129Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/graph.py::_snake_case:0 2025-09-07T06:57:22.1642061Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/graph.py::Graph.eliminate_dead_code:0, line 1873 <- wrt source file 2025-09-07T06:57:22.1643134Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/graph.py::Graph.eliminate_dead_code:0 2025-09-07T06:57:22.1644103Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/graph.py::Graph.on_generate_code:0, line 1967 <- wrt source file 2025-09-07T06:57:22.1645069Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/graph.py::Graph.on_generate_code:0 2025-09-07T06:57:22.1646035Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/passes/shape_prop.py::ShapeProp:0, line 99 <- wrt source file 2025-09-07T06:57:22.1647006Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/passes/shape_prop.py::ShapeProp:0 2025-09-07T06:57:22.1647978Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/passes/split_module.py::split_module:0, line 89 <- wrt source file 2025-09-07T06:57:22.1648991Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/passes/split_module.py::split_module:0 2025-09-07T06:57:22.1650066Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/passes/graph_drawer.py::FxGraphDrawer.get_dot_graph:0, line 129 <- wrt source file 2025-09-07T06:57:22.1693009Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/passes/graph_drawer.py::FxGraphDrawer.get_dot_graph:0 2025-09-07T06:57:22.1695333Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/passes/utils/matcher_with_name_node_map_utils.py::SubgraphMatcherWithNameNodeMap:0, line 51 <- wrt source file 2025-09-07T06:57:22.1697938Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/passes/utils/matcher_with_name_node_map_utils.py::SubgraphMatcherWithNameNodeMap:0 2025-09-07T06:57:22.1699397Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/rewriter.py::AST_Rewriter.visit_AnnAssign:0, line 96 <- wrt source file 2025-09-07T06:57:22.1700654Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/rewriter.py::AST_Rewriter.visit_AnnAssign:0 2025-09-07T06:57:22.1701845Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unifiable:0, line 11 <- wrt source file 2025-09-07T06:57:22.1702965Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unifiable:0 2025-09-07T06:57:22.1704041Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::reify_object:0, line 37 <- wrt source file 2025-09-07T06:57:22.1705165Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::reify_object:0 2025-09-07T06:57:22.1706241Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unify_object:0, line 93 <- wrt source file 2025-09-07T06:57:22.1707344Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unify_object:0 2025-09-07T06:57:22.1708398Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/core.py::reify:0, line 58 <- wrt source file 2025-09-07T06:57:22.1709450Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/core.py::reify:0 2025-09-07T06:57:22.1710523Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/variable.py::variables:0, line 67 <- wrt source file 2025-09-07T06:57:22.1711653Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/variable.py::variables:0 2025-09-07T06:57:22.1712844Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/match.py::VarDispatcher:0, line 48 <- wrt source file 2025-09-07T06:57:22.1713990Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/match.py::VarDispatcher:0 2025-09-07T06:57:22.1715111Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge:0, line 37 <- wrt source file 2025-09-07T06:57:22.1732983Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge:0 2025-09-07T06:57:22.1734226Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge_with:0, line 64 <- wrt source file 2025-09-07T06:57:22.1737038Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge_with:0 2025-09-07T06:57:22.1738221Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valmap:0, line 90 <- wrt source file 2025-09-07T06:57:22.1740443Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valmap:0 2025-09-07T06:57:22.1741618Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keymap:0, line 106 <- wrt source file 2025-09-07T06:57:22.1743977Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keymap:0 2025-09-07T06:57:22.1745144Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemmap:0, line 122 <- wrt source file 2025-09-07T06:57:22.1747338Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemmap:0 2025-09-07T06:57:22.1748612Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valfilter:0, line 138 <- wrt source file 2025-09-07T06:57:22.1751597Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valfilter:0 2025-09-07T06:57:22.1752784Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keyfilter:0, line 158 <- wrt source file 2025-09-07T06:57:22.1755886Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keyfilter:0 2025-09-07T06:57:22.1757074Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemfilter:0, line 178 <- wrt source file 2025-09-07T06:57:22.1760993Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemfilter:0 2025-09-07T06:57:22.1762151Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc:0, line 204 <- wrt source file 2025-09-07T06:57:22.1764255Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc:0 2025-09-07T06:57:22.1765411Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::dissoc:0, line 221 <- wrt source file 2025-09-07T06:57:22.1769259Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::dissoc:0 2025-09-07T06:57:22.1770440Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc_in:0, line 247 <- wrt source file 2025-09-07T06:57:22.1773065Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc_in:0 2025-09-07T06:57:22.1774309Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::update_in:0, line 275 <- wrt source file 2025-09-07T06:57:22.1780628Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::update_in:0 2025-09-07T06:57:22.1781812Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::get_in:0, line 328 <- wrt source file 2025-09-07T06:57:22.1789630Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::get_in:0 2025-09-07T06:57:22.1790803Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::groupby:0, line 375 <- wrt source file 2025-09-07T06:57:22.1793855Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::groupby:0 2025-09-07T06:57:22.1795008Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::first:0, line 416 <- wrt source file 2025-09-07T06:57:22.1796791Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::first:0 2025-09-07T06:57:22.1797923Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::transitive_get:0, line 15 <- wrt source file 2025-09-07T06:57:22.1801282Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::transitive_get:0 2025-09-07T06:57:22.1802391Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::_toposort:0, line 42 <- wrt source file 2025-09-07T06:57:22.1803497Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::_toposort:0 2025-09-07T06:57:22.1804570Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::reverse_dict:0, line 70 <- wrt source file 2025-09-07T06:57:22.1805698Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::reverse_dict:0 2025-09-07T06:57:22.1806774Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::freeze:0, line 95 <- wrt source file 2025-09-07T06:57:22.1809365Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::freeze:0 2025-09-07T06:57:22.1810502Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/core.py::dispatch:0, line 20 <- wrt source file 2025-09-07T06:57:22.1813036Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/core.py::dispatch:0 2025-09-07T06:57:22.1814423Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher:0, line 113 <- wrt source file 2025-09-07T06:57:22.1815907Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher:0 2025-09-07T06:57:22.1817266Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.register:0, line 138 <- wrt source file 2025-09-07T06:57:22.1818670Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.register:0 2025-09-07T06:57:22.1820014Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.add:0, line 191 <- wrt source file 2025-09-07T06:57:22.1821401Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.add:0 2025-09-07T06:57:22.1822754Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.dispatch:0, line 304 <- wrt source file 2025-09-07T06:57:22.1824157Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.dispatch:0 2025-09-07T06:57:22.1825508Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::str_signature:0, line 434 <- wrt source file 2025-09-07T06:57:22.1826857Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::str_signature:0 2025-09-07T06:57:22.1828245Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::isvariadic:0, line 47 <- wrt source file 2025-09-07T06:57:22.1829646Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::isvariadic:0 2025-09-07T06:57:22.1830980Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::Variadic:0, line 83 <- wrt source file 2025-09-07T06:57:22.1832265Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::Variadic:0 2025-09-07T06:57:22.1833527Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::expand_tuples:0, line 18 <- wrt source file 2025-09-07T06:57:22.1834831Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::expand_tuples:0 2025-09-07T06:57:22.1836078Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::_toposort:0, line 41 <- wrt source file 2025-09-07T06:57:22.1837347Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::_toposort:0 2025-09-07T06:57:22.1838584Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::reverse_dict:0, line 68 <- wrt source file 2025-09-07T06:57:22.1839876Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::reverse_dict:0 2025-09-07T06:57:22.1841117Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::groupby:0, line 87 <- wrt source file 2025-09-07T06:57:22.1842450Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::groupby:0 2025-09-07T06:57:22.1843705Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::typename:0, line 117 <- wrt source file 2025-09-07T06:57:22.1844985Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::typename:0 2025-09-07T06:57:22.1846035Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/profiler.py::profile:0, line 75 <- wrt source file 2025-09-07T06:57:22.1846953Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/profiler.py::profile:0 2025-09-07T06:57:22.1847898Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:0, line 114 <- wrt source file 2025-09-07T06:57:22.1848872Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:0 2025-09-07T06:57:22.1849807Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:1, line 125 <- wrt source file 2025-09-07T06:57:22.1850769Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:1 2025-09-07T06:57:22.1851698Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:2, line 140 <- wrt source file 2025-09-07T06:57:22.1852646Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:2 2025-09-07T06:57:22.1853722Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_multi_output_jit_fn:0, line 173 <- wrt source file 2025-09-07T06:57:22.1854872Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_multi_output_jit_fn:0 2025-09-07T06:57:22.1855942Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/gds.py::gds_register_buffer:0, line 42 <- wrt source file 2025-09-07T06:57:22.1856965Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/gds.py::gds_register_buffer:0 2025-09-07T06:57:22.1857892Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/gds.py::gds_deregister_buffer:0, line 58 <- wrt source file 2025-09-07T06:57:22.1858835Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/gds.py::gds_deregister_buffer:0 2025-09-07T06:57:22.1859719Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/gds.py::GdsFile:0, line 85 <- wrt source file 2025-09-07T06:57:22.1860584Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/gds.py::GdsFile:0 2025-09-07T06:57:22.1861502Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py::aot_function:0, line 768 <- wrt source file 2025-09-07T06:57:22.2178016Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py::aot_function:0 2025-09-07T06:57:22.2179691Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::vjp:0, line 233 <- wrt source file 2025-09-07T06:57:22.2220950Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::vjp:0 2025-09-07T06:57:22.2222616Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacrev:0, line 475 <- wrt source file 2025-09-07T06:57:22.2283319Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacrev:0 2025-09-07T06:57:22.2285178Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jvp:0, line 1023 <- wrt source file 2025-09-07T06:57:22.3139904Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jvp:0 2025-09-07T06:57:22.3141427Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacfwd:0, line 1181 <- wrt source file 2025-09-07T06:57:22.3205960Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacfwd:0 2025-09-07T06:57:22.3207254Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::hessian:0, line 1341 <- wrt source file 2025-09-07T06:57:22.3225921Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::hessian:0 2025-09-07T06:57:22.3227005Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::functionalize:0, line 1505 <- wrt source file 2025-09-07T06:57:22.3231043Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::functionalize:0 2025-09-07T06:57:22.3232126Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::linearize:0, line 1704 <- wrt source file 2025-09-07T06:57:22.3402981Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::linearize:0 2025-09-07T06:57:22.3404503Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/benchmark_utils.py::benchmark_utilization:0, line 184 <- wrt source file 2025-09-07T06:57:22.3405904Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/benchmark_utils.py::benchmark_utilization:0 2025-09-07T06:57:22.3407330Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/functional_call.py::functional_call:0, line 36 <- wrt source file 2025-09-07T06:57:22.3409274Z * SUCCESS: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/functional_call.py::functional_call:0 2025-09-07T06:57:22.3410508Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/fx_minifier.py::minifier:0, line 194 <- wrt source file 2025-09-07T06:57:22.3411726Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/fx_minifier.py::minifier:0 2025-09-07T06:57:22.3412841Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py::CompilerWrapper.post_compile:0, line 1131 <- wrt source file 2025-09-07T06:57:22.3414171Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py::CompilerWrapper.post_compile:0 2025-09-07T06:57:22.3415396Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py::InductorWrapper.post_compile:0, line 1186 <- wrt source file 2025-09-07T06:57:22.3416632Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py::InductorWrapper.post_compile:0 2025-09-07T06:57:22.3417714Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LambdaLR:0, line 283 <- wrt source file 2025-09-07T06:57:22.3418706Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LambdaLR:0 2025-09-07T06:57:22.3419693Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiplicativeLR:0, line 391 <- wrt source file 2025-09-07T06:57:22.3420851Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiplicativeLR:0 2025-09-07T06:57:22.3421842Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::StepLR:0, line 494 <- wrt source file 2025-09-07T06:57:22.3422777Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::StepLR:0 2025-09-07T06:57:22.3423717Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiStepLR:0, line 550 <- wrt source file 2025-09-07T06:57:22.3424682Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiStepLR:0 2025-09-07T06:57:22.3425616Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ConstantLR:0, line 611 <- wrt source file 2025-09-07T06:57:22.3426576Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ConstantLR:0 2025-09-07T06:57:22.3427490Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LinearLR:0, line 686 <- wrt source file 2025-09-07T06:57:22.3428425Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LinearLR:0 2025-09-07T06:57:22.3429389Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ExponentialLR:0, line 776 <- wrt source file 2025-09-07T06:57:22.3430500Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ExponentialLR:0 2025-09-07T06:57:22.3431476Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::SequentialLR:0, line 823 <- wrt source file 2025-09-07T06:57:22.3432558Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::SequentialLR:0 2025-09-07T06:57:22.3433638Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::PolynomialLR:0, line 974 <- wrt source file 2025-09-07T06:57:22.3434662Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::PolynomialLR:0 2025-09-07T06:57:22.3435655Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingLR:0, line 1065 <- wrt source file 2025-09-07T06:57:22.3436708Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingLR:0 2025-09-07T06:57:22.3437730Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ChainedScheduler:0, line 1137 <- wrt source file 2025-09-07T06:57:22.3438774Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ChainedScheduler:0 2025-09-07T06:57:22.3439725Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CyclicLR:0, line 1511 <- wrt source file 2025-09-07T06:57:22.3440666Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CyclicLR:0 2025-09-07T06:57:22.3441699Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts:0, line 1752 <- wrt source file 2025-09-07T06:57:22.3442847Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts:0 2025-09-07T06:57:22.3443962Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:0, line 1806 <- wrt source file 2025-09-07T06:57:22.3445224Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:0 2025-09-07T06:57:22.3446374Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:1, line 1822 <- wrt source file 2025-09-07T06:57:22.3447543Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:1 2025-09-07T06:57:22.3448571Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::OneCycleLR:0, line 1960 <- wrt source file 2025-09-07T06:57:22.3449738Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::OneCycleLR:0 2025-09-07T06:57:22.3450668Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:0, line 152 <- wrt source file 2025-09-07T06:57:22.3451646Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:0 2025-09-07T06:57:22.3452577Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:1, line 178 <- wrt source file 2025-09-07T06:57:22.3453523Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:1 2025-09-07T06:57:22.3454503Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/swa_utils.py::update_bn:0, line 337 <- wrt source file 2025-09-07T06:57:22.3455545Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/swa_utils.py::update_bn:0 2025-09-07T06:57:22.3456430Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/swa_utils.py::SWALR:0, line 396 <- wrt source file 2025-09-07T06:57:22.3457402Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/swa_utils.py::SWALR:0 2025-09-07T06:57:22.3458449Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/optimizer.py::Optimizer.load_state_dict:0, line 890 <- wrt source file 2025-09-07T06:57:22.3459511Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/optimizer.py::Optimizer.load_state_dict:0 2025-09-07T06:57:22.3460481Z * DOCTEST : /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_logging/_internal.py::set_logs:0, line 459 <- wrt source file 2025-09-07T06:57:22.3461420Z * SKIPPED: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_logging/_internal.py::set_logs:0 2025-09-07T06:57:22.3461941Z ============ 2025-09-07T06:57:22.3462182Z Finished doctests 2025-09-07T06:57:22.3462388Z 374 / 863 passed 2025-09-07T06:57:22.3462606Z  2025-09-07T06:57:22.3462874Z === Found 17 parse-time warnings === 2025-09-07T06:57:22.3463231Z --- Parse Warning: 1 / 17 --- 2025-09-07T06:57:22.3464118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=Library.fallback in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py line=375. 2025-09-07T06:57:22.3465079Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3465575Z Registers the function implementation as the fallback for the given key. 2025-09-07T06:57:22.3465951Z 2025-09-07T06:57:22.3466237Z This function only works for a library with global namespace ("_"). 2025-09-07T06:57:22.3466586Z 2025-09-07T06:57:22.3466762Z Args: 2025-09-07T06:57:22.3467114Z fn: function used as fallback for the given dispatch key or :func:`~fallthrough_kernel` 2025-09-07T06:57:22.3467544Z to register a fallthrough. 2025-09-07T06:57:22.3468119Z dispatch_key: dispatch key that the input function should be registered for. By default, it uses 2025-09-07T06:57:22.3468632Z the dispatch key that the library was created with. 2025-09-07T06:57:22.3469166Z with_keyset: flag controlling if the current dispatcher call keyset should be passed as the first argument 2025-09-07T06:57:22.3469817Z to :attr:`fn` when calling. This should be used to create the appropriate keyset for redispatch calls. 2025-09-07T06:57:22.3470242Z 2025-09-07T06:57:22.3470418Z Example:: 2025-09-07T06:57:22.3470624Z 2025-09-07T06:57:22.3470821Z >>> my_lib = Library("_", "IMPL") 2025-09-07T06:57:22.3471130Z >>> def fallback_kernel(op, *args, **kwargs): 2025-09-07T06:57:22.3471448Z >>> # Handle all autocast ops generically 2025-09-07T06:57:22.3471730Z >>> # ... 2025-09-07T06:57:22.3471998Z >>> my_lib.fallback(fallback_kernel, "Autocast") 2025-09-07T06:57:22.3472288Z 2025-09-07T06:57:22.3472915Z Original Error: IndentationError('expected an indented block after function definition on line 2', ('', 5, 1, 'my_lib.fallback(fallback_kernel, "Autocast")\n', 5, 7)) 2025-09-07T06:57:22.3473572Z 2025-09-07T06:57:22.3473782Z my_lib.fallback(fallback_kernel, "Autocast") 2025-09-07T06:57:22.3474060Z ^ 2025-09-07T06:57:22.3474241Z warnings.warn(msg) 2025-09-07T06:57:22.3474458Z 2025-09-07T06:57:22.3474709Z --- Parse Warning: 2 / 17 --- 2025-09-07T06:57:22.3475649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=register_fake in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py line=948. 2025-09-07T06:57:22.3476583Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3477077Z Register a FakeTensor implementation ("fake impl") for this operator. 2025-09-07T06:57:22.3477496Z 2025-09-07T06:57:22.3477813Z Also sometimes known as a "meta kernel", "abstract impl". 2025-09-07T06:57:22.3478124Z 2025-09-07T06:57:22.3478417Z An "FakeTensor implementation" specifies the behavior of this operator on 2025-09-07T06:57:22.3478891Z Tensors that carry no data ("FakeTensor"). Given some input Tensors with 2025-09-07T06:57:22.3479367Z certain properties (sizes/strides/storage_offset/device), it specifies 2025-09-07T06:57:22.3479779Z what the properties of the output Tensors are. 2025-09-07T06:57:22.3480065Z 2025-09-07T06:57:22.3480350Z The FakeTensor implementation has the same signature as the operator. 2025-09-07T06:57:22.3480820Z It is run for both FakeTensors and meta tensors. To write a FakeTensor 2025-09-07T06:57:22.3481276Z implementation, assume that all Tensor inputs to the operator are 2025-09-07T06:57:22.3481732Z regular CPU/CUDA/Meta tensors, but they do not have storage, and 2025-09-07T06:57:22.3482173Z you are trying to return regular CPU/CUDA/Meta tensor(s) as output. 2025-09-07T06:57:22.3482635Z The FakeTensor implementation must consist of only PyTorch operations 2025-09-07T06:57:22.3483082Z (and may not directly access the storage or data of any input or 2025-09-07T06:57:22.3483431Z intermediate Tensors). 2025-09-07T06:57:22.3483672Z 2025-09-07T06:57:22.3483912Z This API may be used as a decorator (see examples). 2025-09-07T06:57:22.3484218Z 2025-09-07T06:57:22.3484443Z For a detailed guide on custom ops, please see 2025-09-07T06:57:22.3484855Z https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html 2025-09-07T06:57:22.3485211Z 2025-09-07T06:57:22.3485387Z Args: 2025-09-07T06:57:22.3485689Z op_name: Operator name (along with the overload) or OpOverload object. 2025-09-07T06:57:22.3486153Z func: Fake tensor implementation. 2025-09-07T06:57:22.3486520Z lib (Optional[Library]): Library to register the fake tensor to. 2025-09-07T06:57:22.3486933Z allow_override: Flag controlling if we want to override an 2025-09-07T06:57:22.3487328Z existing registered fake impl. This is by default off, 2025-09-07T06:57:22.3487718Z and will error you're trying to register a fake impl to 2025-09-07T06:57:22.3488114Z an operator that already has a fake impl. This also only 2025-09-07T06:57:22.3488494Z applies if the custom operator was not created via 2025-09-07T06:57:22.3488890Z torch.library.custom_op, as overriding and existing fake 2025-09-07T06:57:22.3489251Z impl is already allowed. 2025-09-07T06:57:22.3489514Z 2025-09-07T06:57:22.3489686Z Examples: 2025-09-07T06:57:22.3489898Z >>> import torch 2025-09-07T06:57:22.3490144Z >>> import numpy as np 2025-09-07T06:57:22.3490416Z >>> from torch import Tensor 2025-09-07T06:57:22.3490675Z >>> 2025-09-07T06:57:22.3490955Z >>> # Example 1: an operator without data-dependent output shape 2025-09-07T06:57:22.3491403Z >>> @torch.library.custom_op("mylib::custom_linear", mutates_args=()) 2025-09-07T06:57:22.3491870Z >>> def custom_linear(x: Tensor, weight: Tensor, bias: Tensor) -> Tensor: 2025-09-07T06:57:22.3492305Z >>> raise NotImplementedError("Implementation goes here") 2025-09-07T06:57:22.3492763Z >>> 2025-09-07T06:57:22.3493033Z >>> @torch.library.register_fake("mylib::custom_linear") 2025-09-07T06:57:22.3493369Z >>> def _(x, weight, bias): 2025-09-07T06:57:22.3493628Z >>> assert x.dim() == 2 2025-09-07T06:57:22.3493971Z >>> assert weight.dim() == 2 2025-09-07T06:57:22.3494255Z >>> assert bias.dim() == 1 2025-09-07T06:57:22.3494654Z >>> assert x.shape[1] == weight.shape[1] 2025-09-07T06:57:22.3495063Z >>> assert weight.shape[0] == bias.shape[0] 2025-09-07T06:57:22.3495383Z >>> assert x.device == weight.device 2025-09-07T06:57:22.3495659Z >>> 2025-09-07T06:57:22.3495883Z >>> return (x @ weight.t()) + bias 2025-09-07T06:57:22.3496156Z >>> 2025-09-07T06:57:22.3496410Z >>> with torch._subclasses.fake_tensor.FakeTensorMode(): 2025-09-07T06:57:22.3496739Z >>> x = torch.randn(2, 3) 2025-09-07T06:57:22.3497020Z >>> w = torch.randn(3, 3) 2025-09-07T06:57:22.3497286Z >>> b = torch.randn(3) 2025-09-07T06:57:22.3497583Z >>> y = torch.ops.mylib.custom_linear(x, w, b) 2025-09-07T06:57:22.3497883Z >>> 2025-09-07T06:57:22.3498099Z >>> assert y.shape == (2, 3) 2025-09-07T06:57:22.3498358Z >>> 2025-09-07T06:57:22.3498635Z >>> # Example 2: an operator with data-dependent output shape 2025-09-07T06:57:22.3499069Z >>> @torch.library.custom_op("mylib::custom_nonzero", mutates_args=()) 2025-09-07T06:57:22.3499464Z >>> def custom_nonzero(x: Tensor) -> Tensor: 2025-09-07T06:57:22.3499764Z >>> x_np = x.numpy(force=True) 2025-09-07T06:57:22.3500068Z >>> res = np.stack(np.nonzero(x_np), axis=1) 2025-09-07T06:57:22.3500400Z >>> return torch.tensor(res, device=x.device) 2025-09-07T06:57:22.3500683Z >>> 2025-09-07T06:57:22.3500935Z >>> @torch.library.register_fake("mylib::custom_nonzero") 2025-09-07T06:57:22.3501264Z >>> def _(x): 2025-09-07T06:57:22.3501538Z >>> # Number of nonzero-elements is data-dependent. 2025-09-07T06:57:22.3501899Z >>> # Since we cannot peek at the data in an fake impl, 2025-09-07T06:57:22.3502264Z >>> # we use the ctx object to construct a new symint that 2025-09-07T06:57:22.3502714Z >>> # represents the data-dependent size. 2025-09-07T06:57:22.3503035Z >>> ctx = torch.library.get_ctx() 2025-09-07T06:57:22.3503330Z >>> nnz = ctx.new_dynamic_size() 2025-09-07T06:57:22.3503618Z >>> shape = [nnz, x.dim()] 2025-09-07T06:57:22.3503937Z >>> result = x.new_empty(shape, dtype=torch.int64) 2025-09-07T06:57:22.3504250Z >>> return result 2025-09-07T06:57:22.3504488Z >>> 2025-09-07T06:57:22.3504750Z >>> from torch.fx.experimental.proxy_tensor import make_fx 2025-09-07T06:57:22.3505068Z >>> 2025-09-07T06:57:22.3505290Z >>> x = torch.tensor([0, 1, 2, 3, 4, 0]) 2025-09-07T06:57:22.3505689Z >>> trace = make_fx(torch.ops.mylib.custom_nonzero, tracing_mode="symbolic")(x) 2025-09-07T06:57:22.3506093Z >>> trace.print_readable() 2025-09-07T06:57:22.3506349Z >>> 2025-09-07T06:57:22.3506651Z >>> assert torch.allclose(trace(x), torch.ops.mylib.custom_nonzero(x)) 2025-09-07T06:57:22.3507010Z 2025-09-07T06:57:22.3507193Z 2025-09-07T06:57:22.3507734Z Original Error: IndentationError('expected an indented block after function definition on line 37', ('', 38, 1, '_._ = None\n', 38, 2)) 2025-09-07T06:57:22.3508320Z 2025-09-07T06:57:22.3508497Z _._ = None 2025-09-07T06:57:22.3508679Z ^ 2025-09-07T06:57:22.3508883Z warnings.warn(msg) 2025-09-07T06:57:22.3509116Z 2025-09-07T06:57:22.3509379Z --- Parse Warning: 3 / 17 --- 2025-09-07T06:57:22.3510332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=get_kernel in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py line=1482. 2025-09-07T06:57:22.3511281Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3511766Z Returns the computed kernel for a given operator and dispatch key. 2025-09-07T06:57:22.3512104Z 2025-09-07T06:57:22.3512479Z This function retrieves the kernel that would be executed for a given 2025-09-07T06:57:22.3513022Z operator and dispatch key combination. The returned SafeKernelFunction 2025-09-07T06:57:22.3513474Z can be used to call the kernel in a boxed fashion. The intended use 2025-09-07T06:57:22.3513896Z case for this function is to retrieve the original kernel for a given 2025-09-07T06:57:22.3514344Z dispatch key and then register another kernel to the same dispatch key 2025-09-07T06:57:22.3514764Z that calls into the original kernel for certain cases. 2025-09-07T06:57:22.3515057Z 2025-09-07T06:57:22.3515226Z Args: 2025-09-07T06:57:22.3515506Z op: Operator name (along with the overload) or OpOverload object 2025-09-07T06:57:22.3515954Z Can be a string (e.g., "aten::add.Tensor"), an OpOverload, or a CustomOpDef. 2025-09-07T06:57:22.3516452Z dispatch_key (str | torch.DispatchKey): The dispatch key to get the kernel for. 2025-09-07T06:57:22.3516912Z Can be a string (e.g., "CPU", "CUDA") or a DispatchKey enum value. 2025-09-07T06:57:22.3517233Z 2025-09-07T06:57:22.3517404Z Returns: 2025-09-07T06:57:22.3517719Z torch._C._SafeKernelFunction: A safe kernel function that can be used to 2025-09-07T06:57:22.3518093Z call the kernel. 2025-09-07T06:57:22.3518324Z 2025-09-07T06:57:22.3518492Z Raises: 2025-09-07T06:57:22.3518723Z RuntimeError: If the operator does not exist. 2025-09-07T06:57:22.3519011Z 2025-09-07T06:57:22.3519178Z Example: 2025-09-07T06:57:22.3519392Z >>> # Get the CPU kernel for torch.add 2025-09-07T06:57:22.3519741Z >>> kernel = torch.library.get_kernel("aten::add.Tensor", "CPU") 2025-09-07T06:57:22.3520060Z >>> 2025-09-07T06:57:22.3520274Z >>> # You can also use DispatchKey enum 2025-09-07T06:57:22.3520769Z >>> kernel = torch.library.get_kernel("aten::add.Tensor", torch.DispatchKey.CPU) 2025-09-07T06:57:22.3521150Z >>> 2025-09-07T06:57:22.3521356Z >>> # Or use an OpOverload directly 2025-09-07T06:57:22.3521732Z >>> kernel = torch.library.get_kernel(torch.ops.aten.add.Tensor, "CPU") 2025-09-07T06:57:22.3522080Z >>> 2025-09-07T06:57:22.3522352Z >>> # Example: Using get_kernel in a custom op with conditional dispatch 2025-09-07T06:57:22.3522737Z >>> # Get the original kernel for torch.sin 2025-09-07T06:57:22.3523117Z >>> original_sin_kernel = torch.library.get_kernel("aten::sin", "CPU") 2025-09-07T06:57:22.3523459Z >>> 2025-09-07T06:57:22.3523754Z >>> # If input has negative values, use original sin, otherwise return zeros 2025-09-07T06:57:22.3524155Z >>> def conditional_sin_impl(dispatch_keys, x): 2025-09-07T06:57:22.3524450Z >>> if (x < 0).any(): 2025-09-07T06:57:22.3524772Z >>> return original_sin_kernel.call_boxed(dispatch_keys, x) 2025-09-07T06:57:22.3525106Z >>> else: 2025-09-07T06:57:22.3525350Z >>> return torch.zeros_like(x) 2025-09-07T06:57:22.3525613Z >>> 2025-09-07T06:57:22.3525847Z >>> lib = torch.library.Library("aten", "IMPL") 2025-09-07T06:57:22.3526289Z >>> # with_keyset=True so the first argument to the impl is the current DispatchKeySet 2025-09-07T06:57:22.3526757Z >>> which needs to be the first argument to ``kernel.call_boxed`` 2025-09-07T06:57:22.3527257Z >>> lib.impl("sin", conditional_sin_impl, "CPU", with_keyset=True) 2025-09-07T06:57:22.3527588Z >>> 2025-09-07T06:57:22.3527810Z >>> # Test the conditional behavior 2025-09-07T06:57:22.3528113Z >>> x_positive = torch.tensor([1.0, 2.0]) 2025-09-07T06:57:22.3528415Z >>> x_mixed = torch.tensor([-1.0, 2.0]) 2025-09-07T06:57:22.3528706Z >>> torch.sin(x_positive) 2025-09-07T06:57:22.3529037Z tensor([0., 0.]) 2025-09-07T06:57:22.3529340Z >>> torch.sin(x_mixed) 2025-09-07T06:57:22.3529590Z tensor([-0.8415, 0.9093]) 2025-09-07T06:57:22.3529824Z 2025-09-07T06:57:22.3530316Z Original Error: SyntaxError('invalid syntax', ('', 23, 7, 'which needs to be the first argument to ``kernel.call_boxed``\n', 23, 12)) 2025-09-07T06:57:22.3530856Z 2025-09-07T06:57:22.3531101Z which needs to be the first argument to ``kernel.call_boxed`` 2025-09-07T06:57:22.3531427Z ^ 2025-09-07T06:57:22.3531621Z warnings.warn(msg) 2025-09-07T06:57:22.3531835Z 2025-09-07T06:57:22.3532087Z --- Parse Warning: 4 / 17 --- 2025-09-07T06:57:22.3532982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=is_available in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/accelerator/__init__.py line=66. 2025-09-07T06:57:22.3534172Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3534680Z Check if the current accelerator is available at runtime: it was build, all the 2025-09-07T06:57:22.3535159Z required drivers are available and at least one device is visible. 2025-09-07T06:57:22.3535560Z See :ref:`accelerator` for details. 2025-09-07T06:57:22.3535844Z 2025-09-07T06:57:22.3536020Z Returns: 2025-09-07T06:57:22.3536363Z bool: A boolean indicating if there is an available :ref:`accelerator`. 2025-09-07T06:57:22.3536752Z 2025-09-07T06:57:22.3537049Z .. note:: This API delegates to the device-specific version of `is_available`. 2025-09-07T06:57:22.3537548Z On CUDA, when the environment variable ``PYTORCH_NVML_BASED_CUDA_CHECK=1`` is set, 2025-09-07T06:57:22.3538166Z this function will NOT poison fork. Otherwise, it will. For more details, see 2025-09-07T06:57:22.3538602Z :ref:`multiprocessing-poison-fork-note`. 2025-09-07T06:57:22.3538895Z 2025-09-07T06:57:22.3539075Z Example:: 2025-09-07T06:57:22.3539274Z 2025-09-07T06:57:22.3539592Z >>> assert torch.accelerator.is_available() "No available accelerators detected." 2025-09-07T06:57:22.3539975Z 2025-09-07T06:57:22.3540531Z Original Error: SyntaxError('invalid syntax', ('', 1, 41, 'assert torch.accelerator.is_available() "No available accelerators detected."\n', 1, 78)) 2025-09-07T06:57:22.3541136Z 2025-09-07T06:57:22.3541452Z assert torch.accelerator.is_available() "No available accelerators detected." 2025-09-07T06:57:22.3541861Z ^ 2025-09-07T06:57:22.3542126Z warnings.warn(msg) 2025-09-07T06:57:22.3542334Z 2025-09-07T06:57:22.3542586Z --- Parse Warning: 5 / 17 --- 2025-09-07T06:57:22.3543485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=synchronize in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/accelerator/__init__.py line=212. 2025-09-07T06:57:22.3544482Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3544955Z Wait for all kernels in all streams on the given device to complete. 2025-09-07T06:57:22.3545294Z 2025-09-07T06:57:22.3545469Z Args: 2025-09-07T06:57:22.3545844Z device (:class:`torch.device`, str, int, optional): device for which to synchronize. It must match 2025-09-07T06:57:22.3546494Z the current :ref:`accelerator` device type. If not given, 2025-09-07T06:57:22.3546948Z use :func:`torch.accelerator.current_device_index` by default. 2025-09-07T06:57:22.3547270Z 2025-09-07T06:57:22.3547617Z .. note:: This function is a no-op if the current :ref:`accelerator` is not initialized. 2025-09-07T06:57:22.3548031Z 2025-09-07T06:57:22.3548205Z Example:: 2025-09-07T06:57:22.3548469Z 2025-09-07T06:57:22.3548761Z >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA) 2025-09-07T06:57:22.3549190Z >>> assert torch.accelerator.is_available() "No available accelerators detected." 2025-09-07T06:57:22.3549621Z >>> start_event = torch.Event(enable_timing=True) 2025-09-07T06:57:22.3549949Z >>> end_event = torch.Event(enable_timing=True) 2025-09-07T06:57:22.3550247Z >>> start_event.record() 2025-09-07T06:57:22.3550621Z >>> tensor = torch.randn(100, device=torch.accelerator.current_accelerator()) 2025-09-07T06:57:22.3551006Z >>> sum = torch.sum(tensor) 2025-09-07T06:57:22.3551268Z >>> end_event.record() 2025-09-07T06:57:22.3551550Z >>> torch.accelerator.synchronize() 2025-09-07T06:57:22.3551894Z >>> elapsed_time_ms = start_event.elapsed_time(end_event) 2025-09-07T06:57:22.3552202Z 2025-09-07T06:57:22.3552758Z Original Error: SyntaxError('invalid syntax', ('', 2, 41, 'assert torch.accelerator.is_available() "No available accelerators detected."\n', 2, 78)) 2025-09-07T06:57:22.3553363Z 2025-09-07T06:57:22.3553671Z assert torch.accelerator.is_available() "No available accelerators detected." 2025-09-07T06:57:22.3554068Z ^ 2025-09-07T06:57:22.3554329Z warnings.warn(msg) 2025-09-07T06:57:22.3554547Z 2025-09-07T06:57:22.3554789Z --- Parse Warning: 6 / 17 --- 2025-09-07T06:57:22.3555638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=cudart in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py line=434. 2025-09-07T06:57:22.3556582Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3556987Z Retrieves the CUDA runtime API module. 2025-09-07T06:57:22.3557325Z 2025-09-07T06:57:22.3557496Z 2025-09-07T06:57:22.3557803Z This function initializes the CUDA runtime environment if it is not already 2025-09-07T06:57:22.3558295Z initialized and returns the CUDA runtime API module (_cudart). The CUDA 2025-09-07T06:57:22.3558747Z runtime API module provides access to various CUDA runtime functions. 2025-09-07T06:57:22.3559101Z 2025-09-07T06:57:22.3559278Z Args: 2025-09-07T06:57:22.3559470Z ``None`` 2025-09-07T06:57:22.3559673Z 2025-09-07T06:57:22.3559852Z Returns: 2025-09-07T06:57:22.3560105Z module: The CUDA runtime API module (_cudart). 2025-09-07T06:57:22.3560391Z 2025-09-07T06:57:22.3560557Z Raises: 2025-09-07T06:57:22.3560868Z RuntimeError: If CUDA cannot be re-initialized in a forked subprocess. 2025-09-07T06:57:22.3561429Z AssertionError: If PyTorch is not compiled with CUDA support or if libcudart functions are unavailable. 2025-09-07T06:57:22.3561888Z 2025-09-07T06:57:22.3562100Z Example of CUDA operations with profiling: 2025-09-07T06:57:22.3562388Z >>> import torch 2025-09-07T06:57:22.3562656Z >>> from torch.cuda import cudart, check_error 2025-09-07T06:57:22.3562945Z >>> import os 2025-09-07T06:57:22.3563159Z >>> 2025-09-07T06:57:22.3563376Z >>> os.environ["CUDA_PROFILE"] = "1" 2025-09-07T06:57:22.3563633Z >>> 2025-09-07T06:57:22.3563863Z >>> def perform_cuda_operations_with_streams(): 2025-09-07T06:57:22.3564174Z >>> stream = torch.cuda.Stream() 2025-09-07T06:57:22.3564553Z >>> with torch.cuda.stream(stream): 2025-09-07T06:57:22.3564850Z >>> x = torch.randn(100, 100, device='cuda') 2025-09-07T06:57:22.3565165Z >>> y = torch.randn(100, 100, device='cuda') 2025-09-07T06:57:22.3565456Z >>> z = torch.mul(x, y) 2025-09-07T06:57:22.3565715Z >>> return z 2025-09-07T06:57:22.3565933Z >>> 2025-09-07T06:57:22.3566224Z >>> torch.cuda.synchronize() 2025-09-07T06:57:22.3566585Z >>> print("====== Start nsys profiling ======") 2025-09-07T06:57:22.3566910Z >>> check_error(cudart().cudaProfilerStart()) 2025-09-07T06:57:22.3567235Z >>> with torch.autograd.profiler.emit_nvtx(): 2025-09-07T06:57:22.3567574Z >>> result = perform_cuda_operations_with_streams() 2025-09-07T06:57:22.3567903Z >>> print("CUDA operations completed.") 2025-09-07T06:57:22.3568239Z >>> check_error(torch.cuda.cudart().cudaProfilerStop()) 2025-09-07T06:57:22.3568573Z >>> print("====== End nsys profiling ======") 2025-09-07T06:57:22.3568844Z 2025-09-07T06:57:22.3569114Z To run this example and save the profiling information, execute: 2025-09-07T06:57:22.3569674Z >>> $ nvprof --profile-from-start off --csv --print-summary -o trace_name.prof -f -- python cudart_test.py 2025-09-07T06:57:22.3570131Z 2025-09-07T06:57:22.3570442Z This command profiles the CUDA operations in the provided script and saves 2025-09-07T06:57:22.3570905Z the profiling information to a file named `trace_name.prof`. 2025-09-07T06:57:22.3571353Z The `--profile-from-start off` option ensures that profiling starts only 2025-09-07T06:57:22.3571766Z after the `cudaProfilerStart` call in the script. 2025-09-07T06:57:22.3572168Z The `--csv` and `--print-summary` options format the profiling output as a 2025-09-07T06:57:22.3572555Z CSV file and print a summary, respectively. 2025-09-07T06:57:22.3572988Z The `-o` option specifies the output file name, and the `-f` option forces the 2025-09-07T06:57:22.3573422Z overwrite of the output file if it already exists. 2025-09-07T06:57:22.3573728Z 2025-09-07T06:57:22.3574524Z Original Error: SyntaxError('invalid syntax', ('', 1, 1, '$ nvprof --profile-from-start off --csv --print-summary -o trace_name.prof -f -- python cudart_test.py\n', 1, 2)) 2025-09-07T06:57:22.3575221Z 2025-09-07T06:57:22.3575618Z $ nvprof --profile-from-start off --csv --print-summary -o trace_name.prof -f -- python cudart_test.py 2025-09-07T06:57:22.3590429Z ^ 2025-09-07T06:57:22.3590706Z warnings.warn(msg) 2025-09-07T06:57:22.3590964Z 2025-09-07T06:57:22.3591289Z --- Parse Warning: 7 / 17 --- 2025-09-07T06:57:22.3592504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=ActivationSparsifier in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py line=16. 2025-09-07T06:57:22.3593751Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3594144Z 2025-09-07T06:57:22.3594481Z The Activation sparsifier class aims to sparsify/prune activations in a neural 2025-09-07T06:57:22.3595000Z network. The idea is to attach the sparsifier to a layer (or layers) and it 2025-09-07T06:57:22.3595502Z zeroes out the activations based on the mask_fn (or sparsification function) 2025-09-07T06:57:22.3595896Z input by the user. 2025-09-07T06:57:22.3596233Z The mask_fn is applied once all the inputs are aggregated and reduced i.e. 2025-09-07T06:57:22.3596652Z mask = mask_fn(reduce_fn(aggregate_fn(activations))) 2025-09-07T06:57:22.3596957Z 2025-09-07T06:57:22.3597151Z Note:: 2025-09-07T06:57:22.3597525Z The sparsification mask is computed on the input **before it goes through the attached layer**. 2025-09-07T06:57:22.3598113Z 2025-09-07T06:57:22.3598295Z Args: 2025-09-07T06:57:22.3598498Z model (nn.Module): 2025-09-07T06:57:22.3598853Z The model whose layers will be sparsified. The layers that needs to be 2025-09-07T06:57:22.3599364Z sparsified should be added separately using the register_layer() function 2025-09-07T06:57:22.3599773Z aggregate_fn (Optional, Callable): 2025-09-07T06:57:22.3600368Z default aggregate_fn that is used if not specified while registering the layer. 2025-09-07T06:57:22.3600829Z specifies how inputs should be aggregated over time. 2025-09-07T06:57:22.3601307Z The aggregate_fn should usually take 2 torch tensors and return the aggregated tensor. 2025-09-07T06:57:22.3601719Z Example 2025-09-07T06:57:22.3602006Z def add_agg_fn(tensor1, tensor2): return tensor1 + tensor2 2025-09-07T06:57:22.3602367Z reduce_fn (Optional, Callable): 2025-09-07T06:57:22.3602768Z default reduce_fn that is used if not specified while registering the layer. 2025-09-07T06:57:22.3603283Z reduce_fn will be called on the aggregated tensor i.e. the tensor obtained after 2025-09-07T06:57:22.3603701Z calling agg_fn() on all inputs. 2025-09-07T06:57:22.3603988Z Example 2025-09-07T06:57:22.3604319Z def mean_reduce_fn(agg_tensor): return agg_tensor.mean(dim=0) 2025-09-07T06:57:22.3604696Z mask_fn (Optional, Callable): 2025-09-07T06:57:22.3605148Z default mask_fn that is used to create the sparsification mask using the tensor obtained after 2025-09-07T06:57:22.3605718Z calling the reduce_fn(). This is used by default if a custom one is passed in the 2025-09-07T06:57:22.3606132Z register_layer(). 2025-09-07T06:57:22.3606601Z Note that the mask_fn() definition should contain the sparse arguments that is passed in sparse_config 2025-09-07T06:57:22.3607079Z arguments. 2025-09-07T06:57:22.3607340Z features (Optional, list): 2025-09-07T06:57:22.3607649Z default selected features to sparsify. 2025-09-07T06:57:22.3608164Z If this is non-empty, then the mask_fn will be applied for each feature of the input. 2025-09-07T06:57:22.3608569Z For example, 2025-09-07T06:57:22.3608951Z mask = [mask_fn(reduce_fn(aggregated_fn(input[feature])) for feature in features] 2025-09-07T06:57:22.3609365Z feature_dim (Optional, int): 2025-09-07T06:57:22.3609783Z default dimension of input features. Again, features along this dim will be chosen 2025-09-07T06:57:22.3610192Z for sparsification. 2025-09-07T06:57:22.3610462Z sparse_config (Dict): 2025-09-07T06:57:22.3610821Z Default configuration for the mask_fn. This config will be passed 2025-09-07T06:57:22.3611187Z with the mask_fn() 2025-09-07T06:57:22.3611434Z 2025-09-07T06:57:22.3611611Z Example: 2025-09-07T06:57:22.3611816Z >>> # xdoctest: +SKIP 2025-09-07T06:57:22.3612065Z >>> model = SomeModel() 2025-09-07T06:57:22.3612418Z >>> act_sparsifier = ActivationSparsifier(...) # init activation sparsifier 2025-09-07T06:57:22.3612812Z >>> # Initialize aggregate_fn 2025-09-07T06:57:22.3613072Z >>> def agg_fn(x, y): 2025-09-07T06:57:22.3613308Z >>> return x + y 2025-09-07T06:57:22.3613536Z >>> 2025-09-07T06:57:22.3613740Z >>> # Initialize reduce_fn 2025-09-07T06:57:22.3614098Z >>> def reduce_fn(x): 2025-09-07T06:57:22.3614346Z >>> return torch.mean(x, dim=0) 2025-09-07T06:57:22.3614605Z >>> 2025-09-07T06:57:22.3614805Z >>> # Initialize mask_fn 2025-09-07T06:57:22.3615041Z >>> def mask_fn(data): 2025-09-07T06:57:22.3615429Z >>> return torch.eye(data.shape).to(data.device) 2025-09-07T06:57:22.3615720Z >>> 2025-09-07T06:57:22.3615902Z >>> 2025-09-07T06:57:22.3616117Z >>> act_sparsifier.register_layer( 2025-09-07T06:57:22.3616402Z ... model.some_layer, 2025-09-07T06:57:22.3616663Z ... aggregate_fn=agg_fn, 2025-09-07T06:57:22.3616931Z ... reduce_fn=reduce_fn, 2025-09-07T06:57:22.3617275Z ... mask_fn=mask_fn, 2025-09-07T06:57:22.3617579Z ... ) 2025-09-07T06:57:22.3617772Z >>> 2025-09-07T06:57:22.3617963Z >>> # start training process 2025-09-07T06:57:22.3618215Z >>> for _ in [...]: 2025-09-07T06:57:22.3618453Z >>> # epoch starts 2025-09-07T06:57:22.3618754Z >>> # model.forward(), compute_loss() and model.backwards() 2025-09-07T06:57:22.3619074Z >>> # epoch ends 2025-09-07T06:57:22.3619318Z >>> act_sparsifier.step() 2025-09-07T06:57:22.3619583Z >>> # end training process 2025-09-07T06:57:22.3619833Z >>> sparsifier.squash_mask() 2025-09-07T06:57:22.3620064Z 2025-09-07T06:57:22.3620550Z Original Error: IndentationError("expected an indented block after 'for' statement on line 25", ('', 26, 1, '_._ = None\n', 26, 2)) 2025-09-07T06:57:22.3621098Z 2025-09-07T06:57:22.3621253Z _._ = None 2025-09-07T06:57:22.3621432Z ^ 2025-09-07T06:57:22.3621612Z warnings.warn(msg) 2025-09-07T06:57:22.3621816Z 2025-09-07T06:57:22.3622083Z --- Parse Warning: 8 / 17 --- 2025-09-07T06:57:22.3623037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=register_parametrization in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/parametrize.py line=424. 2025-09-07T06:57:22.3624080Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3624507Z Register a parametrization to a tensor in a module. 2025-09-07T06:57:22.3624789Z 2025-09-07T06:57:22.3625105Z Assume that ``tensor_name="weight"`` for simplicity. When accessing ``module.weight``, 2025-09-07T06:57:22.3625643Z the module will return the parametrized version ``parametrization(module.weight)``. 2025-09-07T06:57:22.3626262Z If the original tensor requires a gradient, the backward pass will differentiate 2025-09-07T06:57:22.3626800Z through :attr:`parametrization`, and the optimizer will update the tensor accordingly. 2025-09-07T06:57:22.3627190Z 2025-09-07T06:57:22.3627525Z The first time that a module registers a parametrization, this function will add an attribute 2025-09-07T06:57:22.3628059Z ``parametrizations`` to the module of type :class:`~ParametrizationList`. 2025-09-07T06:57:22.3628414Z 2025-09-07T06:57:22.3628707Z The list of parametrizations on the tensor ``weight`` will be accessible under 2025-09-07T06:57:22.3629109Z ``module.parametrizations.weight``. 2025-09-07T06:57:22.3629368Z 2025-09-07T06:57:22.3629572Z The original tensor will be accessible under 2025-09-07T06:57:22.3629893Z ``module.parametrizations.weight.original``. 2025-09-07T06:57:22.3630167Z 2025-09-07T06:57:22.3630463Z Parametrizations may be concatenated by registering several parametrizations 2025-09-07T06:57:22.3630850Z on the same attribute. 2025-09-07T06:57:22.3631062Z 2025-09-07T06:57:22.3631340Z The training mode of a registered parametrization is updated on registration 2025-09-07T06:57:22.3631745Z to match the training mode of the host module 2025-09-07T06:57:22.3632006Z 2025-09-07T06:57:22.3632341Z Parametrized parameters and buffers have an inbuilt caching system that can be activated 2025-09-07T06:57:22.3632774Z using the context manager :func:`cached`. 2025-09-07T06:57:22.3633030Z 2025-09-07T06:57:22.3633321Z A :attr:`parametrization` may optionally implement a method with signature 2025-09-07T06:57:22.3633756Z 2025-09-07T06:57:22.3633933Z .. code-block:: python 2025-09-07T06:57:22.3634152Z 2025-09-07T06:57:22.3634425Z def right_inverse(self, X: Tensor) -> Union[Tensor, Sequence[Tensor]] 2025-09-07T06:57:22.3634751Z 2025-09-07T06:57:22.3635055Z This method is called on the unparametrized tensor when the first parametrization 2025-09-07T06:57:22.3635607Z is registered to compute the initial value of the original tensor. 2025-09-07T06:57:22.3636172Z If this method is not implemented, the original tensor will be just the unparametrized tensor. 2025-09-07T06:57:22.3636571Z 2025-09-07T06:57:22.3636907Z If all the parametrizations registered on a tensor implement `right_inverse` it is possible 2025-09-07T06:57:22.3637471Z to initialize a parametrized tensor by assigning to it, as shown in the example below. 2025-09-07T06:57:22.3637849Z 2025-09-07T06:57:22.3638120Z It is possible for the first parametrization to depend on several inputs. 2025-09-07T06:57:22.3638587Z This may be implemented returning a tuple of tensors from ``right_inverse`` 2025-09-07T06:57:22.3639053Z (see the example implementation of a ``RankOne`` parametrization below). 2025-09-07T06:57:22.3639394Z 2025-09-07T06:57:22.3639760Z In this case, the unconstrained tensors are also located under ``module.parametrizations.weight`` 2025-09-07T06:57:22.3640230Z with names ``original0``, ``original1``,... 2025-09-07T06:57:22.3640491Z 2025-09-07T06:57:22.3640661Z .. note:: 2025-09-07T06:57:22.3640841Z 2025-09-07T06:57:22.3641155Z If unsafe=False (default) both the forward and right_inverse methods will be called 2025-09-07T06:57:22.3641591Z once to perform a number of consistency checks. 2025-09-07T06:57:22.3642019Z If unsafe=True, then right_inverse will be called if the tensor is not parametrized, 2025-09-07T06:57:22.3642424Z and nothing will be called otherwise. 2025-09-07T06:57:22.3642682Z 2025-09-07T06:57:22.3642841Z .. note:: 2025-09-07T06:57:22.3643019Z 2025-09-07T06:57:22.3643271Z In most situations, ``right_inverse`` will be a function such that 2025-09-07T06:57:22.3643624Z ``forward(right_inverse(X)) == X`` (see 2025-09-07T06:57:22.3644126Z `right inverse `_). 2025-09-07T06:57:22.3644658Z Sometimes, when the parametrization is not surjective, it may be reasonable 2025-09-07T06:57:22.3645035Z to relax this. 2025-09-07T06:57:22.3645240Z 2025-09-07T06:57:22.3645406Z .. warning:: 2025-09-07T06:57:22.3645599Z 2025-09-07T06:57:22.3645919Z If a parametrization depends on several inputs, :func:`~register_parametrization` 2025-09-07T06:57:22.3646448Z will register a number of new parameters. If such parametrization is registered 2025-09-07T06:57:22.3646979Z after the optimizer is created, these new parameters will need to be added manually 2025-09-07T06:57:22.3647455Z to the optimizer. See :meth:`torch.Optimizer.add_param_group`. 2025-09-07T06:57:22.3647779Z 2025-09-07T06:57:22.3647949Z Args: 2025-09-07T06:57:22.3648244Z module (nn.Module): module on which to register the parametrization 2025-09-07T06:57:22.3648689Z tensor_name (str): name of the parameter or buffer on which to register 2025-09-07T06:57:22.3649057Z the parametrization 2025-09-07T06:57:22.3649400Z parametrization (nn.Module): the parametrization to register 2025-09-07T06:57:22.3649743Z Keyword args: 2025-09-07T06:57:22.3650043Z unsafe (bool): a boolean flag that denotes whether the parametrization 2025-09-07T06:57:22.3650480Z may change the dtype and shape of the tensor. Default: `False` 2025-09-07T06:57:22.3650951Z Warning: the parametrization is not checked for consistency upon registration. 2025-09-07T06:57:22.3651447Z Enable this flag at your own risk. 2025-09-07T06:57:22.3651708Z 2025-09-07T06:57:22.3651876Z Raises: 2025-09-07T06:57:22.3652233Z ValueError: if the module does not have a parameter or a buffer named :attr:`tensor_name` 2025-09-07T06:57:22.3652627Z 2025-09-07T06:57:22.3652796Z Examples: 2025-09-07T06:57:22.3653117Z >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_LAPACK) 2025-09-07T06:57:22.3653495Z >>> import torch 2025-09-07T06:57:22.3653734Z >>> import torch.nn as nn 2025-09-07T06:57:22.3654080Z >>> import torch.nn.utils.parametrize as P 2025-09-07T06:57:22.3654357Z >>> 2025-09-07T06:57:22.3654562Z >>> class Symmetric(nn.Module): 2025-09-07T06:57:22.3654831Z >>> def forward(self, X): 2025-09-07T06:57:22.3655149Z >>> return X.triu() + X.triu(1).T # Return a symmetric matrix 2025-09-07T06:57:22.3655458Z >>> 2025-09-07T06:57:22.3655662Z >>> def right_inverse(self, A): 2025-09-07T06:57:22.3655926Z >>> return A.triu() 2025-09-07T06:57:22.3656159Z >>> 2025-09-07T06:57:22.3656344Z >>> m = nn.Linear(5, 5) 2025-09-07T06:57:22.3656647Z >>> P.register_parametrization(m, "weight", Symmetric()) 2025-09-07T06:57:22.3657070Z >>> print(torch.allclose(m.weight, m.weight.T)) # m.weight is now symmetric 2025-09-07T06:57:22.3657429Z True 2025-09-07T06:57:22.3657629Z >>> A = torch.rand(5, 5) 2025-09-07T06:57:22.3657891Z >>> A = A + A.T # A is now symmetric 2025-09-07T06:57:22.3658242Z >>> m.weight = A # Initialize the weight to be the symmetric matrix A 2025-09-07T06:57:22.3658604Z >>> print(torch.allclose(m.weight, A)) 2025-09-07T06:57:22.3658872Z True 2025-09-07T06:57:22.3659059Z 2025-09-07T06:57:22.3659246Z >>> class RankOne(nn.Module): 2025-09-07T06:57:22.3659523Z >>> def forward(self, x, y): 2025-09-07T06:57:22.3659827Z >>> # Form a rank 1 matrix multiplying two vectors 2025-09-07T06:57:22.3660159Z >>> return x.unsqueeze(-1) @ y.unsqueeze(-2) 2025-09-07T06:57:22.3660426Z >>> 2025-09-07T06:57:22.3660637Z >>> def right_inverse(self, Z): 2025-09-07T06:57:22.3661019Z >>> # Project Z onto the rank 1 matrices 2025-09-07T06:57:22.3661352Z >>> U, S, Vh = torch.linalg.svd(Z, full_matrices=False) 2025-09-07T06:57:22.3661681Z >>> # Return rescaled singular vectors 2025-09-07T06:57:22.3661979Z >>> s0_sqrt = S[0].sqrt().unsqueeze(-1) 2025-09-07T06:57:22.3662294Z >>> return U[..., :, 0] * s0_sqrt, Vh[..., 0, :] * s0_sqrt 2025-09-07T06:57:22.3662586Z >>> 2025-09-07T06:57:22.3662826Z >>> linear_rank_one = P.register_parametrization( 2025-09-07T06:57:22.3663152Z ... nn.Linear(4, 4), "weight", RankOne() 2025-09-07T06:57:22.3663422Z ... ) 2025-09-07T06:57:22.3663706Z >>> print(torch.linalg.matrix_rank(linear_rank_one.weight).item()) 2025-09-07T06:57:22.3664030Z 1 2025-09-07T06:57:22.3664211Z 2025-09-07T06:57:22.3664376Z 2025-09-07T06:57:22.3664904Z Original Error: IndentationError('expected an indented block after function definition on line 2', ('', 3, 0, '_._ = None\n', 3, -1)) 2025-09-07T06:57:22.3665484Z 2025-09-07T06:57:22.3665651Z _._ = None 2025-09-07T06:57:22.3665837Z ^ 2025-09-07T06:57:22.3666020Z warnings.warn(msg) 2025-09-07T06:57:22.3666233Z 2025-09-07T06:57:22.3666482Z --- Parse Warning: 9 / 17 --- 2025-09-07T06:57:22.3667435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=DeviceMesh.__getitem__ in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/device_mesh.py line=701. 2025-09-07T06:57:22.3668560Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3668920Z 2025-09-07T06:57:22.3669237Z Slice the current DeviceMesh based on the mesh_dim_names given to create a submesh. 2025-09-07T06:57:22.3669765Z The submesh created consists of the dimensions and the communicators indicated by 2025-09-07T06:57:22.3670156Z ``mesh_dim_names`` 2025-09-07T06:57:22.3670443Z 2025-09-07T06:57:22.3670698Z Args: 2025-09-07T06:57:22.3670995Z mesh_dim_names (Union[str, Tuple[str]]): the name or the tuple of names of the 2025-09-07T06:57:22.3671437Z mesh dimension of the DeviceMesh to create the submesh for. 2025-09-07T06:57:22.3671762Z Returns: 2025-09-07T06:57:22.3671962Z A :class:`DeviceMesh` object 2025-09-07T06:57:22.3672200Z 2025-09-07T06:57:22.3672516Z The following program runs on each process/rank in an SPMD manner in a world size of 8. 2025-09-07T06:57:22.3672934Z In the first example: 2025-09-07T06:57:22.3673291Z Calling mesh_2d["tp"] on rank 0, 1, 2, 3 returns a 1D submesh of DeviceMesh:([0, 1, 2, 3]). 2025-09-07T06:57:22.3673787Z Calling mesh_2d["tp"] on rank 4, 5, 6, 7 returns a 1D submesh of DeviceMesh:([4, 5, 6, 7]). 2025-09-07T06:57:22.3674260Z Calling mesh_2d["dp"] on rank 0, 4 returns a 1D submesh of DeviceMesh:([0, 4]). 2025-09-07T06:57:22.3674713Z Calling mesh_2d["dp"] on rank 1, 5 returns a 1D submesh of DeviceMesh:([1, 5]). 2025-09-07T06:57:22.3675163Z Calling mesh_2d["dp"] on rank 2, 6 returns a 1D submesh of DeviceMesh:([2, 6]). 2025-09-07T06:57:22.3675605Z Calling mesh_2d["dp"] on rank 3, 7 returns a 1D submesh of DeviceMesh:([3, 7]). 2025-09-07T06:57:22.3675946Z 2025-09-07T06:57:22.3676132Z In the second example: 2025-09-07T06:57:22.3676497Z Calling mesh_3d["dp", "cp"] on rank 0, 1, 4, 5 returns a 2D submesh of DeviceMesh:([[0, 1], [4, 5]]). 2025-09-07T06:57:22.3677010Z Calling mesh_3d["dp", "cp"] on rank 2, 3, 6, 7 returns a 2D submesh of DeviceMesh:([[2, 3], [6, 7]]). 2025-09-07T06:57:22.3677511Z Calling mesh_3d["cp", "dp"] on rank 0, 1, 4, 5 returns a 2D submesh of DeviceMesh:([[0, 4], [1, 5]]). 2025-09-07T06:57:22.3678011Z Calling mesh_3d["cp", "dp"] on rank 2, 3, 6, 7 returns a 2D submesh of DeviceMesh:([[2, 6], [3, 7]]). 2025-09-07T06:57:22.3678459Z 2025-09-07T06:57:22.3678643Z Example:: 2025-09-07T06:57:22.3678822Z 2025-09-07T06:57:22.3679015Z >>> # xdoctest: +SKIP("no rank") 2025-09-07T06:57:22.3679341Z >>> from torch.distributed.device_mesh import DeviceMesh 2025-09-07T06:57:22.3679647Z >>> 2025-09-07T06:57:22.3679919Z >>> # Initialize a 2D device mesh as (2, 4) to represent the topology 2025-09-07T06:57:22.3680289Z >>> # of cross-host(dim 0), and within-host (dim 1). 2025-09-07T06:57:22.3680707Z >>> mesh_2d = init_device_mesh(device_type="cuda", (2,4), mesh_dim_names=("dp", "tp")) 2025-09-07T06:57:22.3681092Z >>> tp_mesh = mesh_2d["tp"] 2025-09-07T06:57:22.3681338Z >>> dp_mesh = mesh_2d["dp"] 2025-09-07T06:57:22.3681564Z >>> 2025-09-07T06:57:22.3681754Z >>> # Initialize a 3D mesh. 2025-09-07T06:57:22.3682129Z >>> mesh_3d = init_device_mesh(device_type="cuda", (2,2,2), mesh_dim_names=("dp", "pp", "cp")) 2025-09-07T06:57:22.3682696Z >>> # The order of the mesh_dim_names provided deteremines the order of dimensions in the submesh. 2025-09-07T06:57:22.3683130Z >>> dp_cp_mesh = mesh_3d["dp", "cp"] 2025-09-07T06:57:22.3683405Z >>> cp_dp_mesh = mesh_3d["cp", "dp"] 2025-09-07T06:57:22.3683648Z 2025-09-07T06:57:22.3684277Z Original Error: SyntaxError('positional argument follows keyword argument', ('', 6, 82, 'mesh_2d = init_device_mesh(device_type="cuda", (2,4), mesh_dim_names=("dp", "tp"))\n', 6, 83)) 2025-09-07T06:57:22.3684958Z 2025-09-07T06:57:22.3685252Z mesh_2d = init_device_mesh(device_type="cuda", (2,4), mesh_dim_names=("dp", "tp")) 2025-09-07T06:57:22.3685734Z ^ 2025-09-07T06:57:22.3686016Z warnings.warn(msg) 2025-09-07T06:57:22.3686230Z 2025-09-07T06:57:22.3686482Z --- Parse Warning: 10 / 17 --- 2025-09-07T06:57:22.3687507Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=SavePlanner in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py line=122. 2025-09-07T06:57:22.3688602Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3688964Z 2025-09-07T06:57:22.3689286Z Abstract class defining the protocol used by save_state_dict to plan the save process. 2025-09-07T06:57:22.3689681Z 2025-09-07T06:57:22.3690018Z SavePlanners are stateful objects that can be used to customize the whole save process. 2025-09-07T06:57:22.3690420Z 2025-09-07T06:57:22.3690741Z SavePlanner acts as an access proxy to the state_dict, so any transformation done to it 2025-09-07T06:57:22.3691156Z will be visible to the whole process. 2025-09-07T06:57:22.3691409Z 2025-09-07T06:57:22.3691731Z A planner subclass can expect the following sequence of calls during save_state_dict: 2025-09-07T06:57:22.3692120Z 2025-09-07T06:57:22.3692322Z 1) set_up_planner - called on all ranks. 2025-09-07T06:57:22.3692626Z Signals the start of a checkpoint save. 2025-09-07T06:57:22.3692897Z 2025-09-07T06:57:22.3693094Z 2) create_local_plan - called on all ranks. 2025-09-07T06:57:22.3693524Z Process the state_dict and produces a `SavePlan` that will be sent for global planning. 2025-09-07T06:57:22.3693995Z 2025-09-07T06:57:22.3694247Z 3) create_global_plan - called on the coordinator rank only. 2025-09-07T06:57:22.3694653Z Takes the SavePlan from all ranks and make any global decision. 2025-09-07T06:57:22.3694982Z 2025-09-07T06:57:22.3695181Z 4) finish_plan - called on all ranks. 2025-09-07T06:57:22.3695546Z This gives each rank a chance to adjust to global planning decisions. 2025-09-07T06:57:22.3695884Z 2025-09-07T06:57:22.3696108Z 5) resolve_data - called multiple times on each rank 2025-09-07T06:57:22.3696589Z Lookups a value on the `state_dict` for the storage layer to write. 2025-09-07T06:57:22.3696926Z 2025-09-07T06:57:22.3697259Z Users are recommended to extend DefaultSavePlanner instead of this interface directly as 2025-09-07T06:57:22.3697750Z most changes can be expressed by changes in a single method. 2025-09-07T06:57:22.3698058Z 2025-09-07T06:57:22.3698252Z There are 3 usual patterns of extension: 2025-09-07T06:57:22.3698507Z 2025-09-07T06:57:22.3698809Z Rewriting state_dict. This is the simplest way to extend the save process as it 2025-09-07T06:57:22.3699295Z doesn't requite understanding the intrincacies of how SavePlan works: 2025-09-07T06:57:22.3699642Z 2025-09-07T06:57:22.3699833Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:22.3700130Z >>> class RenamePlanner(DefaultSavePlanner): 2025-09-07T06:57:22.3700420Z >>> def set_up_planner( 2025-09-07T06:57:22.3700644Z >>> self, 2025-09-07T06:57:22.3700865Z >>> state_dict: STATE_DICT_TYPE, 2025-09-07T06:57:22.3701159Z >>> storage_meta: Optional[StorageMeta], 2025-09-07T06:57:22.3701447Z >>> is_coordinator: bool, 2025-09-07T06:57:22.3701703Z >>> ) -> None: 2025-09-07T06:57:22.3701927Z >>> # prefix all keys with `foo_`` 2025-09-07T06:57:22.3702351Z >>> super().set_up_planner({"foo_" + k: v for k, v in state_dict.items()}, storage_meta, is_coordinator) 2025-09-07T06:57:22.3702750Z 2025-09-07T06:57:22.3703113Z Modifying local plan and lookup in tandem. This is useful when fine control of how data is persisted 2025-09-07T06:57:22.3703544Z 2025-09-07T06:57:22.3703829Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:22.3704126Z >>> class FP16Planner(DefaultSavePlanner): 2025-09-07T06:57:22.3704407Z >>> def create_local_plan(self): 2025-09-07T06:57:22.3704692Z >>> plan = super().create_local_plan() 2025-09-07T06:57:22.3704974Z >>> for p in plan: 2025-09-07T06:57:22.3705228Z >>> if p.tensor_data is not None: 2025-09-07T06:57:22.3705646Z >>> p.tensor_data.properties.dtype = torch.float16 2025-09-07T06:57:22.3706043Z >>> return plan 2025-09-07T06:57:22.3706274Z >>> 2025-09-07T06:57:22.3706487Z >>> def resolve_data(self, write_item): 2025-09-07T06:57:22.3706792Z >>> item = super().resolve_data(write_item) 2025-09-07T06:57:22.3707215Z >>> return item if write_item.type == WriteItemType.BYTE_IO else item.to(torch.float16) 2025-09-07T06:57:22.3707610Z 2025-09-07T06:57:22.3707982Z Using the global planning step to make central decisions that can't be made individually by each rank 2025-09-07T06:57:22.3708429Z 2025-09-07T06:57:22.3708634Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:22.3708922Z >>> from itertools import zip_longest 2025-09-07T06:57:22.3709206Z >>> from dataclasses import replace 2025-09-07T06:57:22.3709521Z >>> class DDPLoadBalancingPlanner(DefaultSavePlanner): 2025-09-07T06:57:22.3709976Z >>> # This uses the default local plan behavior of having all non-sharded writes in rank 0 2025-09-07T06:57:22.3710414Z >>> # This sample doesn't handle ShardedTensors 2025-09-07T06:57:22.3710736Z >>> def create_global_plan(self, all_plans): 2025-09-07T06:57:22.3711075Z >>> iters = [iter(all_plans[0].items)] * len(all_plans) 2025-09-07T06:57:22.3711392Z >>> items_per_rank = [ 2025-09-07T06:57:22.3711676Z >>> [item for item in items if item is not None] 2025-09-07T06:57:22.3712019Z >>> for items in zip(*zip_longest(*iters), strict=True) 2025-09-07T06:57:22.3712309Z >>> ] 2025-09-07T06:57:22.3712505Z >>> all_plans = [ 2025-09-07T06:57:22.3712767Z >>> replace(plan, items=items) 2025-09-07T06:57:22.3713134Z >>> for plan, items in zip(all_plans, items_per_rank, strict=True) 2025-09-07T06:57:22.3713459Z >>> ] 2025-09-07T06:57:22.3713776Z >>> return super().create_global_plan(all_plans) 2025-09-07T06:57:22.3714061Z 2025-09-07T06:57:22.3714376Z Finally, some planners need to save additional metadata in the checkpoint, this is 2025-09-07T06:57:22.3714903Z accomplished by having each rank contribute their data items in the local plan and 2025-09-07T06:57:22.3715310Z the global planner aggregate them: 2025-09-07T06:57:22.3715559Z 2025-09-07T06:57:22.3715750Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:22.3716060Z >>> class SaveExtraDataPlanner(DefaultSavePlanner): 2025-09-07T06:57:22.3716396Z >>> def create_local_plan(self) -> SavePlan: 2025-09-07T06:57:22.3716697Z >>> plan = super().create_local_plan() 2025-09-07T06:57:22.3717024Z >>> return replace(plan, planner_data="per-rank-data") 2025-09-07T06:57:22.3717320Z >>> 2025-09-07T06:57:22.3717662Z >>> def create_global_plan(self, all_plans: List[SavePlan]) -> Tuple[List[SavePlan], Metadata]: 2025-09-07T06:57:22.3718162Z >>> global_plan, metadata = super().create_global_plan(all_plans) 2025-09-07T06:57:22.3718551Z >>> merged_data = [p.planner_data for p in global_plan] 2025-09-07T06:57:22.3718916Z >>> metadata = replace(metadata, planner_data=merged_data) 2025-09-07T06:57:22.3719244Z >>> return global_plan, metadata 2025-09-07T06:57:22.3719496Z 2025-09-07T06:57:22.3720011Z Original Error: IndentationError('expected an indented block after function definition on line 3', ('', 9, 0, '_._ = None\n', 9, -1)) 2025-09-07T06:57:22.3720585Z 2025-09-07T06:57:22.3720836Z _._ = None 2025-09-07T06:57:22.3721021Z ^ 2025-09-07T06:57:22.3721207Z warnings.warn(msg) 2025-09-07T06:57:22.3721421Z 2025-09-07T06:57:22.3721671Z --- Parse Warning: 11 / 17 --- 2025-09-07T06:57:22.3722635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=LoadPlanner in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py line=305. 2025-09-07T06:57:22.3723833Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3724209Z 2025-09-07T06:57:22.3724540Z Abstract class defining the protocol used by load_state_dict to plan the load process. 2025-09-07T06:57:22.3724936Z 2025-09-07T06:57:22.3725258Z LoadPlanner are stateful objects that can be used to customize the whole load process. 2025-09-07T06:57:22.3725648Z 2025-09-07T06:57:22.3725959Z LoadPlanner acts as an access proxy to the state_dict, so any transformation done to it 2025-09-07T06:57:22.3726375Z will be visible to the whole process. 2025-09-07T06:57:22.3726621Z 2025-09-07T06:57:22.3726942Z A planner subclass can expect the following sequence of calls during load_state_dict: 2025-09-07T06:57:22.3727325Z 2025-09-07T06:57:22.3727521Z 1) set_up_planner - called on all ranks. 2025-09-07T06:57:22.3727819Z Signals the start of loading a checkpoint. 2025-09-07T06:57:22.3728089Z 2025-09-07T06:57:22.3728289Z 2) create_local_plan - called on all ranks. 2025-09-07T06:57:22.3728715Z Process the state_dict and produces a `LoadPlan` that will be sent for global planning. 2025-09-07T06:57:22.3729105Z 2025-09-07T06:57:22.3729347Z 3) create_global_plan - called on the coordinator rank only. 2025-09-07T06:57:22.3729745Z Takes the LoadPlan from all ranks and make any global decision. 2025-09-07T06:57:22.3730065Z 2025-09-07T06:57:22.3730284Z 4) load_bytes - called multiple times on each rank 2025-09-07T06:57:22.3730640Z This is called once per non-tensor value in state_dict. 2025-09-07T06:57:22.3730938Z 2025-09-07T06:57:22.3731213Z 5) resolve_tensor and commit_tensor - called multiple times on each rank 2025-09-07T06:57:22.3731641Z They are called in pair for each Tensor value in state_dict. 2025-09-07T06:57:22.3731952Z 2025-09-07T06:57:22.3732384Z Users are recommended to extend DefaultLoadPlanner instead of this interface directly as 2025-09-07T06:57:22.3732898Z most changes can be expressed by changes in a single method. 2025-09-07T06:57:22.3733210Z 2025-09-07T06:57:22.3733408Z There are two usual patterns of extension: 2025-09-07T06:57:22.3733672Z 2025-09-07T06:57:22.3734043Z Rewriting state_dict. This is the simplest way to extend the load process as it 2025-09-07T06:57:22.3734553Z doesn't requite understanding the intrincacies of how LoadPlan works. We need 2025-09-07T06:57:22.3735041Z to keep a reference to the original state_dict as load happens in place so 2025-09-07T06:57:22.3735427Z we need to be able to perform it in place 2025-09-07T06:57:22.3735687Z 2025-09-07T06:57:22.3735883Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:22.3736192Z >>> class RenamePlanner(DefaultLoadPlanner): 2025-09-07T06:57:22.3736489Z >>> def set_up_planner( 2025-09-07T06:57:22.3736727Z >>> self, 2025-09-07T06:57:22.3736956Z >>> state_dict: STATE_DICT_TYPE, 2025-09-07T06:57:22.3737227Z >>> metadata: Metadata, 2025-09-07T06:57:22.3737495Z >>> is_coordinator: bool, 2025-09-07T06:57:22.3737749Z >>> ) -> None: 2025-09-07T06:57:22.3737983Z >>> self.original_state_dict = state_dict 2025-09-07T06:57:22.3738341Z >>> state_dict = {"foo_" + k: v for k, v in state_dict.items()} 2025-09-07T06:57:22.3738656Z >>> 2025-09-07T06:57:22.3738865Z >>> if self.flatten_sharded_tensors: 2025-09-07T06:57:22.3739180Z >>> state_dict = _flatten_sharded_tensors(state_dict) 2025-09-07T06:57:22.3739575Z >>> 2025-09-07T06:57:22.3739767Z >>> if self.flatten_state_dict: 2025-09-07T06:57:22.3740095Z >>> state_dict, self.mappings = flatten_state_dict(state_dict) 2025-09-07T06:57:22.3740402Z >>> 2025-09-07T06:57:22.3740595Z >>> self.state_dict = state_dict 2025-09-07T06:57:22.3740864Z >>> self.metadata = metadata 2025-09-07T06:57:22.3741229Z >>> self.is_coordinator = is_coordinator 2025-09-07T06:57:22.3741568Z >>> 2025-09-07T06:57:22.3741774Z >>> def load_bytes(self, read_item, value): 2025-09-07T06:57:22.3742058Z >>> # Remove the "foo_" prefix 2025-09-07T06:57:22.3742483Z >>> self.original_state_dict[read_item.dest_index.fqn[4:]] = torch.load(value, weights_only=False) 2025-09-07T06:57:22.3742903Z 2025-09-07T06:57:22.3743069Z 2025-09-07T06:57:22.3743378Z Modifying resolve_tensor and commit_tensor to handle load time transformation. 2025-09-07T06:57:22.3743754Z 2025-09-07T06:57:22.3743936Z >>> # xdoctest: +SKIP("undefined vars") 2025-09-07T06:57:22.3744256Z >>> class MetaModelMaterialize(DefaultSavePlanner): 2025-09-07T06:57:22.3744578Z >>> def resolve_tensor(self, read_item): 2025-09-07T06:57:22.3744881Z >>> tensor = super().resolve_tensor(read_item) 2025-09-07T06:57:22.3745213Z >>> return torch.empty_like(tensor, device="cpu") 2025-09-07T06:57:22.3745498Z >>> 2025-09-07T06:57:22.3745718Z >>> def commit_tensor(self, read_item, tensor): 2025-09-07T06:57:22.3746053Z >>> self.state_dict[read_item.dest_index.fqn] = tensor 2025-09-07T06:57:22.3746346Z 2025-09-07T06:57:22.3746877Z Original Error: IndentationError('expected an indented block after function definition on line 22', ('', 23, 0, '_._ = None\n', 23, -1)) 2025-09-07T06:57:22.3747461Z 2025-09-07T06:57:22.3747636Z _._ = None 2025-09-07T06:57:22.3747810Z ^ 2025-09-07T06:57:22.3748001Z warnings.warn(msg) 2025-09-07T06:57:22.3748214Z 2025-09-07T06:57:22.3748472Z --- Parse Warning: 12 / 17 --- 2025-09-07T06:57:22.3749413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=FullStateDictConfig in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/api.py line=295. 2025-09-07T06:57:22.3750538Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3750915Z 2025-09-07T06:57:22.3751176Z ``FullStateDictConfig`` is a config class meant to be used with 2025-09-07T06:57:22.3751592Z ``StateDictType.FULL_STATE_DICT``. We recommend enabling both 2025-09-07T06:57:22.3752028Z ``offload_to_cpu=True`` and ``rank0_only=True`` when saving full state 2025-09-07T06:57:22.3752477Z dicts to save GPU memory and CPU memory, respectively. This config class 2025-09-07T06:57:22.3752919Z is meant to be used via the :func:`state_dict_type` context manager as 2025-09-07T06:57:22.3753261Z follows: 2025-09-07T06:57:22.3753438Z 2025-09-07T06:57:22.3753654Z >>> # xdoctest: +SKIP("undefined variables") 2025-09-07T06:57:22.3754056Z >>> from torch.distributed.fsdp import FullyShardedDataParallel as FSDP 2025-09-07T06:57:22.3754449Z >>> fsdp = FSDP(model, auto_wrap_policy=...) 2025-09-07T06:57:22.3754815Z >>> cfg = FullStateDictConfig(offload_to_cpu=True, rank0_only=True) 2025-09-07T06:57:22.3755258Z >>> with FSDP.state_dict_type(fsdp, StateDictType.FULL_STATE_DICT, cfg): 2025-09-07T06:57:22.3755625Z >>> state = fsdp.state_dict() 2025-09-07T06:57:22.3755982Z >>> # `state` will be empty on non rank 0 and contain CPU tensors on rank 0. 2025-09-07T06:57:22.3756450Z >>> # To reload checkpoint for inference, finetuning, transfer learning, etc: 2025-09-07T06:57:22.3756929Z >>> model = model_fn() # Initialize model in preparation for wrapping with FSDP 2025-09-07T06:57:22.3757393Z >>> if dist.get_rank() == 0: 2025-09-07T06:57:22.3757723Z >>> # Load checkpoint only on rank 0 to avoid memory redundancy 2025-09-07T06:57:22.3758092Z >>> state_dict = torch.load("my_checkpoint.pt") 2025-09-07T06:57:22.3758402Z >>> model.load_state_dict(state_dict) 2025-09-07T06:57:22.3758791Z >>> # All ranks initialize FSDP module as usual. `sync_module_states` argument 2025-09-07T06:57:22.3759339Z >>> # communicates loaded checkpoint states from rank 0 to rest of the world. 2025-09-07T06:57:22.3759772Z >>> fsdp = FSDP( 2025-09-07T06:57:22.3759991Z ... model, 2025-09-07T06:57:22.3760237Z ... device_id=torch.cuda.current_device(), 2025-09-07T06:57:22.3760530Z ... auto_wrap_policy=..., 2025-09-07T06:57:22.3760802Z ... sync_module_states=True, 2025-09-07T06:57:22.3761055Z ... ) 2025-09-07T06:57:22.3761350Z >>> # After this point, all ranks have FSDP model with loaded checkpoint. 2025-09-07T06:57:22.3761699Z 2025-09-07T06:57:22.3761875Z Attributes: 2025-09-07T06:57:22.3762157Z rank0_only (bool): If ``True``, then only rank 0 saves the full state 2025-09-07T06:57:22.3762588Z dict, and nonzero ranks save an empty dict. If ``False``, then all 2025-09-07T06:57:22.3762980Z ranks save the full state dict. (Default: ``False``) 2025-09-07T06:57:22.3763285Z 2025-09-07T06:57:22.3763789Z Original Error: IndentationError("expected an indented block after 'if' statement on line 10", ('', 11, 1, '_._ = None\n', 11, 2)) 2025-09-07T06:57:22.3764346Z 2025-09-07T06:57:22.3764522Z _._ = None 2025-09-07T06:57:22.3764713Z ^ 2025-09-07T06:57:22.3764898Z warnings.warn(msg) 2025-09-07T06:57:22.3765113Z 2025-09-07T06:57:22.3765369Z --- Parse Warning: 13 / 17 --- 2025-09-07T06:57:22.3766318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=unsafe_generate_fake_kernels in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/fake_profile.py line=94. 2025-09-07T06:57:22.3767359Z Caused by: DoctestParseError('Failed to parse doctest in _label_docsrc_lines') 2025-09-07T06:57:22.3767726Z 2025-09-07T06:57:22.3768003Z Registers a fake kernel based on the given operator profiles. This fake 2025-09-07T06:57:22.3768561Z kernel registration will override any existing fake kernel registrations. 2025-09-07T06:57:22.3768927Z 2025-09-07T06:57:22.3769193Z The input is a dictionary mapping operator names to a set of operator 2025-09-07T06:57:22.3769642Z profiles, which we will use to generate fake kernels. The operator profiles 2025-09-07T06:57:22.3770088Z are a record of the input and output tensor metadata. Based on this 2025-09-07T06:57:22.3770530Z information we will match a given input to the recorded profile, and return 2025-09-07T06:57:22.3770986Z an output with the same metadata as in the recorded profile. If a profile 2025-09-07T06:57:22.3771369Z doesn't exist then an exception will be thrown. 2025-09-07T06:57:22.3771638Z 2025-09-07T06:57:22.3771921Z The fake kernel generation is considered unsafe because it relies on the 2025-09-07T06:57:22.3772380Z rigid, pre-defined operator profiles that do not account for potential 2025-09-07T06:57:22.3772844Z variations in output behavior. Specifically, the generated kernels assume a 2025-09-07T06:57:22.3773335Z fixed relationship between input and output ranks. However, in reality, it's 2025-09-07T06:57:22.3773823Z possible that data-dependent operations may produce outputs of different 2025-09-07T06:57:22.3774336Z ranks even when given inputs of the same rank. The generated fake kernels 2025-09-07T06:57:22.3774780Z are inflexible and unable to accommodate these nuances, making them 2025-09-07T06:57:22.3775124Z potentially unsafe. 2025-09-07T06:57:22.3775329Z 2025-09-07T06:57:22.3775495Z Args: 2025-09-07T06:57:22.3775869Z op_profiles (dict[str, set[OpProfile]]): A dictionary mapping operator 2025-09-07T06:57:22.3776295Z name to a set of operator profiles from which we will generate fake 2025-09-07T06:57:22.3776616Z kernels. 2025-09-07T06:57:22.3776803Z 2025-09-07T06:57:22.3776964Z Examples: 2025-09-07T06:57:22.3777142Z 2025-09-07T06:57:22.3777381Z >>> # Example: Registering an op-profile from draft-export 2025-09-07T06:57:22.3777779Z >>> import torch 2025-09-07T06:57:22.3778125Z >>> from torch.export._draft_export import draft_export 2025-09-07T06:57:22.3778417Z >>> 2025-09-07T06:57:22.3778670Z >>> @torch.library.custom_op("mylib::foo", mutates_args=()) 2025-09-07T06:57:22.3779014Z >>> def foo(x: Tensor, y: Tensor) -> Tensor: 2025-09-07T06:57:22.3779281Z >>> return x + y 2025-09-07T06:57:22.3779501Z >>> 2025-09-07T06:57:22.3779691Z >>> class M(torch.nn.Module): 2025-09-07T06:57:22.3779948Z >>> def forward(self, a, b): 2025-09-07T06:57:22.3780248Z >>> res = torch.ops.mylib.foo(a, b) # no fake impl 2025-09-07T06:57:22.3780540Z >>> return res 2025-09-07T06:57:22.3780754Z >>> 2025-09-07T06:57:22.3780997Z >>> ep = draft_export(M(), (torch.ones(3, 4), torch.ones(3, 4)) 2025-09-07T06:57:22.3781297Z >>> 2025-09-07T06:57:22.3781638Z >>> with torch._library.fake_profile.unsafe_generate_fake_kernels(ep._report.op_profiles): 2025-09-07T06:57:22.3782080Z >>> decomp = ep.run_decompositions() 2025-09-07T06:57:22.3782353Z 2025-09-07T06:57:22.3782511Z 2025-09-07T06:57:22.3782986Z Original Error: IncompleteParseError('ill-formed doctest: all parts have been processed but the doctest source is not balanced') 2025-09-07T06:57:22.3783519Z 2025-09-07T06:57:22.3783682Z warnings.warn(msg) 2025-09-07T06:57:22.3783889Z 2025-09-07T06:57:22.3784138Z --- Parse Warning: 14 / 17 --- 2025-09-07T06:57:22.3785083Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=CustomOpDef.register_fake in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_library/custom_ops.py line=397. 2025-09-07T06:57:22.3786102Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3786679Z Register a FakeTensor implementation for this custom op. 2025-09-07T06:57:22.3786988Z 2025-09-07T06:57:22.3787282Z This is necessary to get the operator to work efficiently with torch.compile. 2025-09-07T06:57:22.3787637Z 2025-09-07T06:57:22.3787904Z The Fake impl (sometimes also known as a meta kernel or abstract impl) 2025-09-07T06:57:22.3788351Z specifies the behavior of this operator on Tensors that carry no data. 2025-09-07T06:57:22.3788749Z Given some input Tensors with certain properties 2025-09-07T06:57:22.3789168Z (sizes/strides/storage_offset/device), it specifies what the properties of 2025-09-07T06:57:22.3789557Z the output Tensors are. 2025-09-07T06:57:22.3789779Z 2025-09-07T06:57:22.3790034Z Please see :func:`torch.library.register_fake` for more details. 2025-09-07T06:57:22.3790362Z 2025-09-07T06:57:22.3790520Z Args: 2025-09-07T06:57:22.3790778Z fn (Callable): The function to register as the FakeTensor 2025-09-07T06:57:22.3791101Z implementation. 2025-09-07T06:57:22.3791324Z 2025-09-07T06:57:22.3791490Z Examples: 2025-09-07T06:57:22.3791699Z >>> import torch 2025-09-07T06:57:22.3791949Z >>> import numpy as np 2025-09-07T06:57:22.3792203Z >>> from torch import Tensor 2025-09-07T06:57:22.3792445Z >>> 2025-09-07T06:57:22.3792715Z >>> # Example 1: an operator without data-dependent output shape 2025-09-07T06:57:22.3793124Z >>> @torch.library.custom_op("mylib::linear", mutates_args=()) 2025-09-07T06:57:22.3793631Z >>> def linear(x: Tensor, weight: Tensor, bias: Tensor) -> Tensor: 2025-09-07T06:57:22.3793974Z >>> return (x @ weight.t()) + bias 2025-09-07T06:57:22.3794232Z >>> 2025-09-07T06:57:22.3794438Z >>> @linear.register_fake 2025-09-07T06:57:22.3794701Z >>> def _(x, weight, bias): 2025-09-07T06:57:22.3795054Z >>> assert x.dim() == 2 2025-09-07T06:57:22.3795380Z >>> assert weight.dim() == 2 2025-09-07T06:57:22.3795652Z >>> assert bias.dim() == 1 2025-09-07T06:57:22.3795938Z >>> assert x.shape[1] == weight.shape[1] 2025-09-07T06:57:22.3796243Z >>> assert weight.shape[0] == bias.shape[0] 2025-09-07T06:57:22.3796549Z >>> assert x.device == weight.device 2025-09-07T06:57:22.3796864Z >>> return x.new_empty(x.size(0), weight.size(0)) 2025-09-07T06:57:22.3797146Z >>> 2025-09-07T06:57:22.3797345Z >>> x = torch.randn(2, 2) 2025-09-07T06:57:22.3797607Z >>> weight = torch.randn(2, 2) 2025-09-07T06:57:22.3797868Z >>> bias = torch.randn(2) 2025-09-07T06:57:22.3798152Z >>> # xdoctest: +SKIP("Requires Python <= 3.11") 2025-09-07T06:57:22.3798515Z >>> out = torch.compile(linear, fullgraph=True)(x, weight, bias) 2025-09-07T06:57:22.3798874Z >>> # xdoctest: +SKIP("Requires Python <= 3.11") 2025-09-07T06:57:22.3799273Z >>> assert torch.allclose(out, torch.nn.functional.linear(x, weight, bias)) 2025-09-07T06:57:22.3799641Z >>> 2025-09-07T06:57:22.3799895Z >>> # Example 2: an operator with data-dependent output shape 2025-09-07T06:57:22.3800294Z >>> @torch.library.custom_op("mylib::nonzero", mutates_args=()) 2025-09-07T06:57:22.3800646Z >>> def nonzero(x: Tensor) -> Tensor: 2025-09-07T06:57:22.3800931Z >>> x_np = x.cpu().numpy() 2025-09-07T06:57:22.3801215Z >>> res = np.stack(np.nonzero(x_np), axis=1) 2025-09-07T06:57:22.3801531Z >>> return torch.tensor(res, device=x.device) 2025-09-07T06:57:22.3801802Z >>> 2025-09-07T06:57:22.3802005Z >>> @nonzero.register_fake 2025-09-07T06:57:22.3802340Z >>> def _(x): 2025-09-07T06:57:22.3802621Z >>> # Number of nonzero-elements is data-dependent. 2025-09-07T06:57:22.3802971Z >>> # Since we cannot peek at the data in an abstract impl, 2025-09-07T06:57:22.3803319Z >>> # we use the ctx object to construct a new symint that 2025-09-07T06:57:22.3803651Z >>> # represents the data-dependent size. 2025-09-07T06:57:22.3803953Z >>> ctx = torch.library.get_ctx() 2025-09-07T06:57:22.3804238Z >>> nnz = ctx.new_dynamic_size() 2025-09-07T06:57:22.3804515Z >>> shape = [nnz, x.dim()] 2025-09-07T06:57:22.3804651Z >>> result = x.new_empty(shape, dtype=torch.int64) 2025-09-07T06:57:22.3804738Z >>> return result 2025-09-07T06:57:22.3804815Z >>> 2025-09-07T06:57:22.3804916Z >>> x = torch.tensor([0, 1, 2, 0, 0, 1]) 2025-09-07T06:57:22.3805039Z >>> # xdoctest: +SKIP("Requires Python <= 3.11") 2025-09-07T06:57:22.3805163Z >>> out = torch.compile(nonzero, fullgraph=True)(x) 2025-09-07T06:57:22.3805277Z >>> # xdoctest: +SKIP("Requires Python <= 3.11") 2025-09-07T06:57:22.3805390Z >>> assert torch.allclose(out, x.nonzero()) 2025-09-07T06:57:22.3805468Z 2025-09-07T06:57:22.3805540Z 2025-09-07T06:57:22.3805967Z Original Error: IndentationError('expected an indented block after function definition on line 36', ('', 37, 1, '_._ = None\n', 37, 2)) 2025-09-07T06:57:22.3806116Z 2025-09-07T06:57:22.3806193Z _._ = None 2025-09-07T06:57:22.3806259Z ^ 2025-09-07T06:57:22.3806344Z warnings.warn(msg) 2025-09-07T06:57:22.3806422Z 2025-09-07T06:57:22.3806575Z --- Parse Warning: 15 / 17 --- 2025-09-07T06:57:22.3807315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=vmap in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/apis.py line=39. 2025-09-07T06:57:22.3807575Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3807650Z 2025-09-07T06:57:22.3807829Z vmap is the vectorizing map; ``vmap(func)`` returns a new function that 2025-09-07T06:57:22.3807996Z maps ``func`` over some dimension of the inputs. Semantically, vmap 2025-09-07T06:57:22.3808175Z pushes the map into PyTorch operations called by ``func``, effectively 2025-09-07T06:57:22.3808281Z vectorizing those operations. 2025-09-07T06:57:22.3808351Z 2025-09-07T06:57:22.3808528Z vmap is useful for handling batch dimensions: one can write a function 2025-09-07T06:57:22.3808689Z ``func`` that runs on examples and then lift it to a function that can 2025-09-07T06:57:22.3808859Z take batches of examples with ``vmap(func)``. vmap can also be used to 2025-09-07T06:57:22.3809000Z compute batched gradients when composed with autograd. 2025-09-07T06:57:22.3809074Z 2025-09-07T06:57:22.3809149Z .. note:: 2025-09-07T06:57:22.3809311Z :func:`torch.vmap` is aliased to :func:`torch.func.vmap` for 2025-09-07T06:57:22.3809428Z convenience. Use whichever one you'd like. 2025-09-07T06:57:22.3809504Z 2025-09-07T06:57:22.3809571Z Args: 2025-09-07T06:57:22.3809743Z func (function): A Python function that takes one or more arguments. 2025-09-07T06:57:22.3809845Z Must return one or more Tensors. 2025-09-07T06:57:22.3810006Z in_dims (int or nested structure): Specifies which dimension of the 2025-09-07T06:57:22.3810152Z inputs should be mapped over. ``in_dims`` should have a 2025-09-07T06:57:22.3810308Z structure like the inputs. If the ``in_dim`` for a particular 2025-09-07T06:57:22.3810461Z input is None, then that indicates there is no map dimension. 2025-09-07T06:57:22.3810539Z Default: 0. 2025-09-07T06:57:22.3810771Z out_dims (int or Tuple[int]): Specifies where the mapped dimension 2025-09-07T06:57:22.3810930Z should appear in the outputs. If ``out_dims`` is a Tuple, then 2025-09-07T06:57:22.3811064Z it should have one element per output. Default: 0. 2025-09-07T06:57:22.3811218Z randomness (str): Specifies whether the randomness in this 2025-09-07T06:57:22.3811395Z vmap should be the same or different across batches. If 'different', 2025-09-07T06:57:22.3811554Z the randomness for each batch will be different. If 'same', the 2025-09-07T06:57:22.3811736Z randomness will be the same across batches. If 'error', any calls to 2025-09-07T06:57:22.3811905Z random functions will error. Default: 'error'. WARNING: this flag 2025-09-07T06:57:22.3812074Z only applies to random PyTorch operations and does not apply to 2025-09-07T06:57:22.3812187Z Python's random module or numpy randomness. 2025-09-07T06:57:22.3812380Z chunk_size (None or int): If None (default), apply a single vmap over inputs. 2025-09-07T06:57:22.3812556Z If not None, then compute the vmap :attr:`chunk_size` samples at a time. 2025-09-07T06:57:22.3812767Z Note that :attr:`chunk_size=1` is equivalent to computing the vmap with a for-loop. 2025-09-07T06:57:22.3812979Z If you run into memory issues computing the vmap, please try a non-None chunk_size. 2025-09-07T06:57:22.3813051Z 2025-09-07T06:57:22.3813125Z Returns: 2025-09-07T06:57:22.3813287Z Returns a new "batched" function. It takes the same inputs as 2025-09-07T06:57:22.3813517Z ``func``, except each input has an extra dimension at the index 2025-09-07T06:57:22.3813684Z specified by ``in_dims``. It takes returns the same outputs as 2025-09-07T06:57:22.3813929Z ``func``, except each output has an extra dimension at the index 2025-09-07T06:57:22.3814024Z specified by ``out_dims``. 2025-09-07T06:57:22.3814093Z 2025-09-07T06:57:22.3814168Z .. warning: 2025-09-07T06:57:22.3814424Z :func:`vmap` works best with functional-style code. Please do not 2025-09-07T06:57:22.3814649Z perform any side-effects in ``func``, with the exception of 2025-09-07T06:57:22.3814848Z in-place PyTorch operations. Examples of side-effects include mutating 2025-09-07T06:57:22.3815027Z Python data structures and assigning values to variables not captured 2025-09-07T06:57:22.3815103Z in ``func``. 2025-09-07T06:57:22.3815170Z 2025-09-07T06:57:22.3815363Z One example of using :func:`vmap` is to compute batched dot products. PyTorch 2025-09-07T06:57:22.3815539Z doesn't provide a batched ``torch.dot`` API; instead of unsuccessfully 2025-09-07T06:57:22.3815720Z rummaging through docs, use :func:`vmap` to construct a new function. 2025-09-07T06:57:22.3815791Z 2025-09-07T06:57:22.3815884Z >>> torch.dot # [D], [D] -> [] 2025-09-07T06:57:22.3816050Z >>> batched_dot = torch.func.vmap(torch.dot) # [N, D], [N, D] -> [N] 2025-09-07T06:57:22.3816167Z >>> x, y = torch.randn(2, 5), torch.randn(2, 5) 2025-09-07T06:57:22.3816252Z >>> batched_dot(x, y) 2025-09-07T06:57:22.3816329Z 2025-09-07T06:57:22.3816513Z :func:`vmap` can be helpful in hiding batch dimensions, leading to a simpler 2025-09-07T06:57:22.3816612Z model authoring experience. 2025-09-07T06:57:22.3816680Z 2025-09-07T06:57:22.3816777Z >>> batch_size, feature_size = 3, 5 2025-09-07T06:57:22.3816923Z >>> weights = torch.randn(feature_size, requires_grad=True) 2025-09-07T06:57:22.3816995Z >>> 2025-09-07T06:57:22.3817092Z >>> def model(feature_vec): 2025-09-07T06:57:22.3817203Z >>> # Very simple linear model with activation 2025-09-07T06:57:22.3817318Z >>> return feature_vec.dot(weights).relu() 2025-09-07T06:57:22.3817392Z >>> 2025-09-07T06:57:22.3817527Z >>> examples = torch.randn(batch_size, feature_size) 2025-09-07T06:57:22.3817721Z >>> result = torch.vmap(model)(examples) 2025-09-07T06:57:22.3817802Z 2025-09-07T06:57:22.3818005Z :func:`vmap` can also help vectorize computations that were previously difficult 2025-09-07T06:57:22.3818202Z or impossible to batch. One example is higher-order gradient computation. 2025-09-07T06:57:22.3818389Z The PyTorch autograd engine computes vjps (vector-Jacobian products). 2025-09-07T06:57:22.3818584Z Computing a full Jacobian matrix for some function f: R^N -> R^N usually 2025-09-07T06:57:22.3818786Z requires N calls to ``autograd.grad``, one per Jacobian row. Using :func:`vmap`, 2025-09-07T06:57:22.3818989Z we can vectorize the whole computation, computing the Jacobian in a single 2025-09-07T06:57:22.3819078Z call to ``autograd.grad``. 2025-09-07T06:57:22.3819152Z 2025-09-07T06:57:22.3819228Z >>> # Setup 2025-09-07T06:57:22.3819302Z >>> N = 5 2025-09-07T06:57:22.3819392Z >>> f = lambda x: x**2 2025-09-07T06:57:22.3819497Z >>> x = torch.randn(N, requires_grad=True) 2025-09-07T06:57:22.3819583Z >>> y = f(x) 2025-09-07T06:57:22.3819672Z >>> I_N = torch.eye(N) 2025-09-07T06:57:22.3819756Z >>> 2025-09-07T06:57:22.3819847Z >>> # Sequential approach 2025-09-07T06:57:22.3820034Z >>> jacobian_rows = [torch.autograd.grad(y, x, v, retain_graph=True)[0] 2025-09-07T06:57:22.3820130Z >>> for v in I_N.unbind()] 2025-09-07T06:57:22.3820244Z >>> jacobian = torch.stack(jacobian_rows) 2025-09-07T06:57:22.3820320Z >>> 2025-09-07T06:57:22.3820431Z >>> # vectorized gradient computation 2025-09-07T06:57:22.3820610Z >>> def get_vjp(v): 2025-09-07T06:57:22.3820733Z >>> return torch.autograd.grad(y, x, v) 2025-09-07T06:57:22.3820840Z >>> jacobian = torch.vmap(get_vjp)(I_N) 2025-09-07T06:57:22.3820925Z 2025-09-07T06:57:22.3821139Z :func:`vmap` can also be nested, producing an output with multiple batched dimensions 2025-09-07T06:57:22.3821217Z 2025-09-07T06:57:22.3821389Z >>> torch.dot # [D], [D] -> [] 2025-09-07T06:57:22.3821546Z >>> batched_dot = torch.vmap( 2025-09-07T06:57:22.3821649Z ... torch.vmap(torch.dot) 2025-09-07T06:57:22.3821752Z ... ) # [N1, N0, D], [N1, N0, D] -> [N1, N0] 2025-09-07T06:57:22.3821883Z >>> x, y = torch.randn(2, 3, 5), torch.randn(2, 3, 5) 2025-09-07T06:57:22.3821991Z >>> batched_dot(x, y) # tensor of size [2, 3] 2025-09-07T06:57:22.3822073Z 2025-09-07T06:57:22.3822285Z If the inputs are not batched along the first dimension, ``in_dims`` specifies 2025-09-07T06:57:22.3822431Z the dimension that each inputs are batched along as 2025-09-07T06:57:22.3822503Z 2025-09-07T06:57:22.3822602Z >>> torch.dot # [N], [N] -> [] 2025-09-07T06:57:22.3822787Z >>> batched_dot = torch.vmap(torch.dot, in_dims=1) # [N, D], [N, D] -> [D] 2025-09-07T06:57:22.3822902Z >>> x, y = torch.randn(2, 5), torch.randn(2, 5) 2025-09-07T06:57:22.3822988Z >>> batched_dot( 2025-09-07T06:57:22.3823079Z ... x, y 2025-09-07T06:57:22.3823243Z ... ) # output is [5] instead of [2] if batched along the 0th dimension 2025-09-07T06:57:22.3823332Z 2025-09-07T06:57:22.3823552Z If there are multiple inputs each of which is batched along different dimensions, 2025-09-07T06:57:22.3823721Z ``in_dims`` must be a tuple with the batch dimension for each input as 2025-09-07T06:57:22.3823801Z 2025-09-07T06:57:22.3823889Z >>> torch.dot # [D], [D] -> [] 2025-09-07T06:57:22.3824093Z >>> batched_dot = torch.vmap(torch.dot, in_dims=(0, None)) # [N, D], [D] -> [N] 2025-09-07T06:57:22.3824198Z >>> x, y = torch.randn(2, 5), torch.randn(5) 2025-09-07T06:57:22.3824286Z >>> batched_dot( 2025-09-07T06:57:22.3824361Z ... x, y 2025-09-07T06:57:22.3824530Z ... ) # second arg doesn't have a batch dim because in_dim[1] was None 2025-09-07T06:57:22.3824599Z 2025-09-07T06:57:22.3824902Z If the input is a Python struct, ``in_dims`` must be a tuple containing a struct 2025-09-07T06:57:22.3825005Z matching the shape of the input: 2025-09-07T06:57:22.3825199Z 2025-09-07T06:57:22.3825480Z >>> f = lambda dict: torch.dot(dict["x"], dict["y"]) 2025-09-07T06:57:22.3825637Z >>> x, y = torch.randn(2, 5), torch.randn(5) 2025-09-07T06:57:22.3825797Z >>> input = {"x": x, "y": y} 2025-09-07T06:57:22.3826006Z >>> batched_dot = torch.vmap(f, in_dims=({"x": 0, "y": None},)) 2025-09-07T06:57:22.3826166Z >>> batched_dot(input) 2025-09-07T06:57:22.3826255Z 2025-09-07T06:57:22.3826647Z By default, the output is batched along the first dimension. However, it can be batched 2025-09-07T06:57:22.3826789Z along any dimension by using ``out_dims`` 2025-09-07T06:57:22.3835486Z 2025-09-07T06:57:22.3835617Z >>> f = lambda x: x**2 2025-09-07T06:57:22.3835731Z >>> x = torch.randn(2, 5) 2025-09-07T06:57:22.3835867Z >>> batched_pow = torch.vmap(f, out_dims=1) 2025-09-07T06:57:22.3835969Z >>> batched_pow(x) # [5, 2] 2025-09-07T06:57:22.3836058Z 2025-09-07T06:57:22.3836322Z For any function that uses kwargs, the returned function will not batch the kwargs but will 2025-09-07T06:57:22.3836418Z accept kwargs 2025-09-07T06:57:22.3836494Z 2025-09-07T06:57:22.3836599Z >>> x = torch.randn([2, 5]) 2025-09-07T06:57:22.3836693Z >>> def fn(x, scale=4.): 2025-09-07T06:57:22.3836788Z >>> return x * scale 2025-09-07T06:57:22.3836867Z >>> 2025-09-07T06:57:22.3836986Z >>> batched_pow = torch.vmap(fn) 2025-09-07T06:57:22.3837258Z >>> assert torch.allclose(batched_pow(x), x * 4) 2025-09-07T06:57:22.3837469Z >>> batched_pow(x, scale=x) # scale is not batched, output has shape [2, 2, 5] 2025-09-07T06:57:22.3837545Z 2025-09-07T06:57:22.3837644Z .. note:: 2025-09-07T06:57:22.3837846Z vmap does not provide general autobatching or handle variable-length 2025-09-07T06:57:22.3837944Z sequences out of the box. 2025-09-07T06:57:22.3838203Z 2025-09-07T06:57:22.3838634Z Original Error: IndentationError('expected an indented block after function definition on line 4', ('', 5, 1, '_._ = None\n', 5, 2)) 2025-09-07T06:57:22.3838720Z 2025-09-07T06:57:22.3838798Z _._ = None 2025-09-07T06:57:22.3838884Z ^ 2025-09-07T06:57:22.3838977Z warnings.warn(msg) 2025-09-07T06:57:22.3839061Z 2025-09-07T06:57:22.3839266Z --- Parse Warning: 16 / 17 --- 2025-09-07T06:57:22.3839958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=grad in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/apis.py line=306. 2025-09-07T06:57:22.3840174Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3840388Z ``grad`` operator helps computing gradients of ``func`` with respect to the 2025-09-07T06:57:22.3840565Z input(s) specified by ``argnums``. This operator can be nested to 2025-09-07T06:57:22.3840685Z compute higher-order gradients. 2025-09-07T06:57:22.3840760Z 2025-09-07T06:57:22.3840847Z Args: 2025-09-07T06:57:22.3841033Z func (Callable): A Python function that takes one or more arguments. 2025-09-07T06:57:22.3841268Z Must return a single-element Tensor. If specified ``has_aux`` equals ``True``, 2025-09-07T06:57:22.3841494Z function can return a tuple of single-element Tensor and other auxiliary objects: 2025-09-07T06:57:22.3841599Z ``(output, aux)``. 2025-09-07T06:57:22.3841826Z argnums (int or Tuple[int]): Specifies arguments to compute gradients with respect to. 2025-09-07T06:57:22.3842008Z ``argnums`` can be single integer or tuple of integers. Default: 0. 2025-09-07T06:57:22.3842268Z has_aux (bool): Flag indicating that ``func`` returns a tensor and other 2025-09-07T06:57:22.3842433Z auxiliary objects: ``(output, aux)``. Default: False. 2025-09-07T06:57:22.3842509Z 2025-09-07T06:57:22.3842588Z Returns: 2025-09-07T06:57:22.3842829Z Function to compute gradients with respect to its inputs. By default, the output of 2025-09-07T06:57:22.3843023Z the function is the gradient tensor(s) with respect to the first argument. 2025-09-07T06:57:22.3843262Z If specified ``has_aux`` equals ``True``, tuple of gradients and output auxiliary objects 2025-09-07T06:57:22.3843470Z is returned. If ``argnums`` is a tuple of integers, a tuple of output gradients with 2025-09-07T06:57:22.3843611Z respect to each ``argnums`` value is returned. 2025-09-07T06:57:22.3843682Z 2025-09-07T06:57:22.3843783Z Example of using ``grad``: 2025-09-07T06:57:22.3843857Z 2025-09-07T06:57:22.3843954Z >>> # xdoctest: +SKIP 2025-09-07T06:57:22.3844059Z >>> from torch.func import grad 2025-09-07T06:57:22.3844161Z >>> x = torch.randn([]) 2025-09-07T06:57:22.3844278Z >>> cos_x = grad(lambda x: torch.sin(x))(x) 2025-09-07T06:57:22.3844399Z >>> assert torch.allclose(cos_x, x.cos()) 2025-09-07T06:57:22.3844474Z >>> 2025-09-07T06:57:22.3844581Z >>> # Second-order gradients 2025-09-07T06:57:22.3844709Z >>> neg_sin_x = grad(grad(lambda x: torch.sin(x)))(x) 2025-09-07T06:57:22.3844837Z >>> assert torch.allclose(neg_sin_x, -x.sin()) 2025-09-07T06:57:22.3844910Z 2025-09-07T06:57:22.3845199Z When composed with ``vmap``, ``grad`` can be used to compute per-sample-gradients: 2025-09-07T06:57:22.3845283Z 2025-09-07T06:57:22.3845366Z >>> # xdoctest: +SKIP 2025-09-07T06:57:22.3845472Z >>> from torch.func import grad, vmap 2025-09-07T06:57:22.3845594Z >>> batch_size, feature_size = 3, 5 2025-09-07T06:57:22.3845669Z >>> 2025-09-07T06:57:22.3845786Z >>> def model(weights, feature_vec): 2025-09-07T06:57:22.3845963Z >>> # Very simple linear model with activation 2025-09-07T06:57:22.3846134Z >>> assert feature_vec.dim() == 1 2025-09-07T06:57:22.3846243Z >>> return feature_vec.dot(weights).relu() 2025-09-07T06:57:22.3846319Z >>> 2025-09-07T06:57:22.3846449Z >>> def compute_loss(weights, example, target): 2025-09-07T06:57:22.3846551Z >>> y = model(weights, example) 2025-09-07T06:57:22.3846682Z >>> return ((y - target) ** 2).mean() # MSELoss 2025-09-07T06:57:22.3846759Z >>> 2025-09-07T06:57:22.3846922Z >>> weights = torch.randn(feature_size, requires_grad=True) 2025-09-07T06:57:22.3847049Z >>> examples = torch.randn(batch_size, feature_size) 2025-09-07T06:57:22.3847160Z >>> targets = torch.randn(batch_size) 2025-09-07T06:57:22.3847267Z >>> inputs = (weights, examples, targets) 2025-09-07T06:57:22.3847478Z >>> grad_weight_per_example = vmap(grad(compute_loss), in_dims=(None, 0, 0))( 2025-09-07T06:57:22.3847563Z ... *inputs 2025-09-07T06:57:22.3847647Z ... ) 2025-09-07T06:57:22.3847719Z 2025-09-07T06:57:22.3847877Z Example of using ``grad`` with ``has_aux`` and ``argnums``: 2025-09-07T06:57:22.3847948Z 2025-09-07T06:57:22.3848039Z >>> # xdoctest: +SKIP 2025-09-07T06:57:22.3848136Z >>> from torch.func import grad 2025-09-07T06:57:22.3848231Z >>> def my_loss_func(y, y_pred): 2025-09-07T06:57:22.3848363Z >>> loss_per_sample = (0.5 * y_pred - y) ** 2 2025-09-07T06:57:22.3848467Z >>> loss = loss_per_sample.mean() 2025-09-07T06:57:22.3848588Z >>> return loss, (y_pred, loss_per_sample) 2025-09-07T06:57:22.3848662Z >>> 2025-09-07T06:57:22.3848802Z >>> fn = grad(my_loss_func, argnums=(0, 1), has_aux=True) 2025-09-07T06:57:22.3848977Z >>> y_true = torch.rand(4) 2025-09-07T06:57:22.3849107Z >>> y_preds = torch.rand(4, requires_grad=True) 2025-09-07T06:57:22.3849203Z >>> out = fn(y_true, y_preds) 2025-09-07T06:57:22.3849419Z >>> # > output is ((grads w.r.t y_true, grads w.r.t y_preds), (y_pred, loss_per_sample)) 2025-09-07T06:57:22.3849492Z 2025-09-07T06:57:22.3849580Z .. note:: 2025-09-07T06:57:22.3849724Z Using PyTorch ``torch.no_grad`` together with ``grad``. 2025-09-07T06:57:22.3849803Z 2025-09-07T06:57:22.3849928Z Case 1: Using ``torch.no_grad`` inside a function: 2025-09-07T06:57:22.3850008Z 2025-09-07T06:57:22.3850096Z >>> # xdoctest: +SKIP 2025-09-07T06:57:22.3850186Z >>> def f(x): 2025-09-07T06:57:22.3850281Z >>> with torch.no_grad(): 2025-09-07T06:57:22.3850365Z >>> c = x ** 2 2025-09-07T06:57:22.3850460Z >>> return x - c 2025-09-07T06:57:22.3850533Z 2025-09-07T06:57:22.3850718Z In this case, ``grad(f)(x)`` will respect the inner ``torch.no_grad``. 2025-09-07T06:57:22.3850788Z 2025-09-07T06:57:22.3850953Z Case 2: Using ``grad`` inside ``torch.no_grad`` context manager: 2025-09-07T06:57:22.3851024Z 2025-09-07T06:57:22.3851122Z >>> # xdoctest: +SKIP 2025-09-07T06:57:22.3851213Z >>> with torch.no_grad(): 2025-09-07T06:57:22.3851304Z >>> grad(f)(x) 2025-09-07T06:57:22.3851374Z 2025-09-07T06:57:22.3851568Z In this case, ``grad`` will respect the inner ``torch.no_grad``, but not the 2025-09-07T06:57:22.3851833Z outer one. This is because ``grad`` is a "function transform": its result 2025-09-07T06:57:22.3852024Z should not depend on the result of a context manager outside of ``f``. 2025-09-07T06:57:22.3852095Z 2025-09-07T06:57:22.3852178Z 2025-09-07T06:57:22.3852665Z Original Error: IndentationError('expected an indented block after function definition on line 5', ('', 6, 1, '_._ = None\n', 6, 2)) 2025-09-07T06:57:22.3852800Z 2025-09-07T06:57:22.3852875Z _._ = None 2025-09-07T06:57:22.3852946Z ^ 2025-09-07T06:57:22.3853042Z warnings.warn(msg) 2025-09-07T06:57:22.3853114Z 2025-09-07T06:57:22.3853290Z --- Parse Warning: 17 / 17 --- 2025-09-07T06:57:22.3854126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/xdoctest/core.py:423: UserWarning: Cannot scrape callname=ReduceLROnPlateau in modpath=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py line=1236. 2025-09-07T06:57:22.3854343Z Caused by: DoctestParseError('Failed to parse doctest in _package_groups') 2025-09-07T06:57:22.3854498Z Reduce learning rate when a metric has stopped improving. 2025-09-07T06:57:22.3854580Z 2025-09-07T06:57:22.3854749Z Models often benefit from reducing the learning rate by a factor 2025-09-07T06:57:22.3854922Z of 2-10 once learning stagnates. This scheduler reads a metrics 2025-09-07T06:57:22.3855088Z quantity and if no improvement is seen for a 'patience' number 2025-09-07T06:57:22.3855206Z of epochs, the learning rate is reduced. 2025-09-07T06:57:22.3855278Z 2025-09-07T06:57:22.3855363Z Args: 2025-09-07T06:57:22.3855481Z optimizer (Optimizer): Wrapped optimizer. 2025-09-07T06:57:22.3855623Z mode (str): One of `min`, `max`. In `min` mode, lr will 2025-09-07T06:57:22.3855757Z be reduced when the quantity monitored has stopped 2025-09-07T06:57:22.3855913Z decreasing; in `max` mode it will be reduced when the 2025-09-07T06:57:22.3856080Z quantity monitored has stopped increasing. Default: 'min'. 2025-09-07T06:57:22.3856239Z factor (float): Factor by which the learning rate will be 2025-09-07T06:57:22.3856365Z reduced. new_lr = lr * factor. Default: 0.1. 2025-09-07T06:57:22.3856645Z patience (int): The number of allowed epochs with no improvement after 2025-09-07T06:57:22.3856780Z which the learning rate will be reduced. 2025-09-07T06:57:22.3856978Z For example, consider the case of having no patience (`patience = 0`). 2025-09-07T06:57:22.3857282Z In the first epoch, a baseline is established and is always considered good as there's no previous baseline. 2025-09-07T06:57:22.3857456Z In the second epoch, if the performance is worse than the baseline, 2025-09-07T06:57:22.3857597Z we have what is considered an intolerable epoch. 2025-09-07T06:57:22.3857825Z Since the count of intolerable epochs (1) is greater than the patience level (0), 2025-09-07T06:57:22.3857969Z the learning rate is reduced at the end of this epoch. 2025-09-07T06:57:22.3858229Z From the third epoch onwards, the learning rate continues to be reduced at the end of each epoch 2025-09-07T06:57:22.3858487Z if the performance is worse than the baseline. If the performance improves or remains the same, 2025-09-07T06:57:22.3858590Z the learning rate is not adjusted. 2025-09-07T06:57:22.3858676Z Default: 10. 2025-09-07T06:57:22.3858836Z threshold (float): Threshold for measuring the new optimum, 2025-09-07T06:57:22.3858975Z to only focus on significant changes. Default: 1e-4. 2025-09-07T06:57:22.3859115Z threshold_mode (str): One of `rel`, `abs`. In `rel` mode, 2025-09-07T06:57:22.3859247Z dynamic_threshold = best * ( 1 + threshold ) in 'max' 2025-09-07T06:57:22.3859456Z mode or best * ( 1 - threshold ) in `min` mode. 2025-09-07T06:57:22.3859588Z In `abs` mode, dynamic_threshold = best + threshold in 2025-09-07T06:57:22.3859736Z `max` mode or best - threshold in `min` mode. Default: 'rel'. 2025-09-07T06:57:22.3859883Z cooldown (int): Number of epochs to wait before resuming 2025-09-07T06:57:22.3860102Z normal operation after lr has been reduced. Default: 0. 2025-09-07T06:57:22.3860309Z min_lr (float or list): A scalar or a list of scalars. A 2025-09-07T06:57:22.3860438Z lower bound on the learning rate of all param groups 2025-09-07T06:57:22.3860554Z or each group respectively. Default: 0. 2025-09-07T06:57:22.3860695Z eps (float): Minimal decay applied to lr. If the difference 2025-09-07T06:57:22.3860855Z between new and old lr is smaller than eps, the update is 2025-09-07T06:57:22.3860950Z ignored. Default: 1e-8. 2025-09-07T06:57:22.3861022Z 2025-09-07T06:57:22.3861095Z Example: 2025-09-07T06:57:22.3861186Z >>> # xdoctest: +SKIP 2025-09-07T06:57:22.3861375Z >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9) 2025-09-07T06:57:22.3861510Z >>> scheduler = ReduceLROnPlateau(optimizer, "min") 2025-09-07T06:57:22.3861596Z >>> for epoch in range(10): 2025-09-07T06:57:22.3861678Z >>> train(...) 2025-09-07T06:57:22.3861776Z >>> val_loss = validate(...) 2025-09-07T06:57:22.3861904Z >>> # Note that step should be called after validate() 2025-09-07T06:57:22.3862000Z >>> scheduler.step(val_loss) 2025-09-07T06:57:22.3862068Z 2025-09-07T06:57:22.3862234Z .. image:: ../scripts/lr_scheduler_images/ReduceLROnPlateau.png 2025-09-07T06:57:22.3862302Z 2025-09-07T06:57:22.3862649Z Original Error: IndentationError('unexpected indent', ('', 8, 4, ' scheduler.step(val_loss)\n', 8, -1)) 2025-09-07T06:57:22.3862718Z 2025-09-07T06:57:22.3862812Z scheduler.step(val_loss) 2025-09-07T06:57:22.3862884Z ^ 2025-09-07T06:57:22.3862972Z warnings.warn(msg) 2025-09-07T06:57:22.3863037Z 2025-09-07T06:57:22.3863133Z  2025-09-07T06:57:22.3863348Z === Found 9 run-time warnings === 2025-09-07T06:57:22.3863500Z --- Runtime Warning: 1 / 9 --- 2025-09-07T06:57:22.3863760Z example = 2025-09-07T06:57:22.3864266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/library.py:282: UserWarning: Warning only once for all operators, other operators may also be overridden. 2025-09-07T06:57:22.3864518Z Overriding a previously registered kernel for the same operator and the same dispatch key 2025-09-07T06:57:22.3864688Z operator: aten::div.Tensor(Tensor self, Tensor other) -> Tensor 2025-09-07T06:57:22.3864930Z registered at /var/lib/jenkins/workspace/build/aten/src/ATen/RegisterSchema.cpp:6 2025-09-07T06:57:22.3865012Z dispatch key: CPU 2025-09-07T06:57:22.3865355Z previous kernel: registered at /var/lib/jenkins/workspace/aten/src/ATen/LegacyBatchingRegistrations.cpp:1079 2025-09-07T06:57:22.3866133Z new kernel: registered at :1 (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/OperatorEntry.cpp:218.) 2025-09-07T06:57:22.3866268Z impl_fn(self.ns, name.split("::")[-1], dispatch_key) 2025-09-07T06:57:22.3866341Z 2025-09-07T06:57:22.3866486Z --- Runtime Warning: 2 / 9 --- 2025-09-07T06:57:22.3866703Z example = 2025-09-07T06:57:22.3867750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py:1351: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /var/lib/jenkins/workspace/c10/core/TensorImpl.h:1971.) 2025-09-07T06:57:22.3867925Z return super().refine_names(names) 2025-09-07T06:57:22.3867994Z 2025-09-07T06:57:22.3868145Z --- Runtime Warning: 3 / 9 --- 2025-09-07T06:57:22.3868436Z example = 2025-09-07T06:57:22.3869755Z :1: UserWarning: Sparse CSR tensor support is in beta state. If you miss a functionality in the sparse tensor support, please submit a feature request to https://github.com/pytorch/pytorch/issues. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/SparseCsrTensorImpl.cpp:53.) 2025-09-07T06:57:22.3869825Z 2025-09-07T06:57:22.3869970Z --- Runtime Warning: 4 / 9 --- 2025-09-07T06:57:22.3870162Z example = 2025-09-07T06:57:22.3871595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nested/__init__.py:117: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. We recommend specifying layout=torch.jagged when constructing a nested tensor, as this layout receives active development, has better operator coverage, and works with torch.compile. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/NestedTensorImpl.cpp:178.) 2025-09-07T06:57:22.3871796Z return torch._nested_tensor_from_tensor_list(ts, dtype, None, device, None) 2025-09-07T06:57:22.3871868Z 2025-09-07T06:57:22.3872008Z --- Runtime Warning: 5 / 9 --- 2025-09-07T06:57:22.3872242Z example = 2025-09-07T06:57:22.3872881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`. 2025-09-07T06:57:22.3872989Z WeightNorm.apply(module, name, dim) 2025-09-07T06:57:22.3873055Z 2025-09-07T06:57:22.3873324Z --- Runtime Warning: 6 / 9 --- 2025-09-07T06:57:22.3873576Z example = 2025-09-07T06:57:22.3874223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`. 2025-09-07T06:57:22.3874322Z WeightNorm.apply(module, name, dim) 2025-09-07T06:57:22.3874393Z 2025-09-07T06:57:22.3874534Z --- Runtime Warning: 7 / 9 --- 2025-09-07T06:57:22.3874770Z example = 2025-09-07T06:57:22.3875638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:392: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-09-07T06:57:22.3875726Z warnings.warn( 2025-09-07T06:57:22.3875795Z 2025-09-07T06:57:22.3875941Z --- Runtime Warning: 8 / 9 --- 2025-09-07T06:57:22.3876205Z example = 2025-09-07T06:57:22.3877040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:392: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-09-07T06:57:22.3877117Z warnings.warn( 2025-09-07T06:57:22.3877283Z 2025-09-07T06:57:22.3877424Z --- Runtime Warning: 9 / 9 --- 2025-09-07T06:57:22.3877672Z example = 2025-09-07T06:57:22.3878925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/experimental/const_fold.py:278: UserWarning: Attempted to insert a get_attr Node with no underlying reference in the owning GraphModule! Call GraphModule.add_submodule to add the necessary submodule, GraphModule.add_parameter to add the necessary Parameter, or nn.Module.register_buffer to add the necessary buffer 2025-09-07T06:57:22.3879131Z new_node = root_const_gm.graph.get_attr(in_node.target) 2025-09-07T06:57:22.3879199Z 2025-09-07T06:57:22.3879440Z === 374 passed, 489 skipped, 26 warnings in 19.11 seconds === 2025-09-07T06:57:22.3879600Z Running test_autoload_disable 1/1 ... [2025-09-07 06:57:22.350767] 2025-09-07T06:57:22.6852099Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions 2025-09-07T06:57:26.4643638Z Preparing metadata (setup.py) ... [?25l- \ done 2025-09-07T06:57:26.4676485Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension 2025-09-07T06:57:26.4687487Z  DEPRECATION: Building 'torch_test_cpp_extension' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'torch_test_cpp_extension'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T07:01:06.8086415Z  Building wheel for torch_test_cpp_extension (setup.py) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | done 2025-09-07T07:01:06.8210153Z [?25h Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=12956725 sha256=03b7b948906f6a10e2636508f0139c2de25615bd35cb461ec055cb55e7ed995d 2025-09-07T07:01:06.8212104Z Stored in directory: /tmp/pip-ephem-wheel-cache-8vcbcawf/wheels/a9/2e/d7/a9e103243c0b754e2324c4ee6ddd055c388a2eefc520cfc979 2025-09-07T07:01:06.8241261Z Successfully built torch_test_cpp_extension 2025-09-07T07:01:07.1758791Z Installing collected packages: torch_test_cpp_extension 2025-09-07T07:01:07.3955875Z Successfully installed torch_test_cpp_extension-0.0.0 2025-09-07T07:01:10.0427627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:01:10.0429752Z import pkg_resources 2025-09-07T07:01:10.0866187Z 2025-09-07T07:01:10.0866434Z Running tests... 2025-09-07T07:01:10.0866933Z ---------------------------------------------------------------------- 2025-09-07T07:01:10.3450117Z s 2025-09-07T07:01:10.3450569Z ---------------------------------------------------------------------- 2025-09-07T07:01:10.3451132Z Ran 1 test in 0.258s 2025-09-07T07:01:10.3451372Z 2025-09-07T07:01:10.3451538Z OK (skipped=1) 2025-09-07T07:01:10.3451736Z 2025-09-07T07:01:10.3451914Z Generating XML reports... 2025-09-07T07:01:10.9366845Z Running test_autoload_enable 1/1 ... [2025-09-07 07:01:10.936421] 2025-09-07T07:01:11.2965519Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions 2025-09-07T07:01:15.0884489Z Preparing metadata (setup.py) ... [?25l- \ done 2025-09-07T07:01:15.0917201Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension 2025-09-07T07:01:15.0928950Z  DEPRECATION: Building 'torch_test_cpp_extension' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'torch_test_cpp_extension'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T07:01:24.3876591Z  Building wheel for torch_test_cpp_extension (setup.py) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | done 2025-09-07T07:01:24.3999807Z [?25h Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=12956725 sha256=f9774d91e04799c3d71b310df9e0ba2d3e4fb6f7ab8c40682f34f1e35809f8a7 2025-09-07T07:01:24.4001871Z Stored in directory: /tmp/pip-ephem-wheel-cache-jsewqxsu/wheels/a9/2e/d7/a9e103243c0b754e2324c4ee6ddd055c388a2eefc520cfc979 2025-09-07T07:01:24.4028731Z Successfully built torch_test_cpp_extension 2025-09-07T07:01:24.7527653Z Installing collected packages: torch_test_cpp_extension 2025-09-07T07:01:24.9803891Z Successfully installed torch_test_cpp_extension-0.0.0 2025-09-07T07:01:27.6224366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:01:27.6226814Z import pkg_resources 2025-09-07T07:01:27.6654452Z 2025-09-07T07:01:27.6654605Z Running tests... 2025-09-07T07:01:27.6655075Z ---------------------------------------------------------------------- 2025-09-07T07:01:27.9226060Z s 2025-09-07T07:01:27.9226496Z ---------------------------------------------------------------------- 2025-09-07T07:01:27.9227042Z Ran 1 test in 0.257s 2025-09-07T07:01:27.9227291Z 2025-09-07T07:01:27.9227441Z OK (skipped=1) 2025-09-07T07:01:27.9227669Z 2025-09-07T07:01:27.9227832Z Generating XML reports... 2025-09-07T07:01:28.5155372Z Running test_cpp_extensions_aot_ninja 1/1 ... [2025-09-07 07:01:28.515267] 2025-09-07T07:01:28.9140818Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions 2025-09-07T07:01:32.6807553Z Preparing metadata (setup.py) ... [?25l- \ done 2025-09-07T07:01:32.6837018Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension 2025-09-07T07:01:32.6848903Z  DEPRECATION: Building 'torch_test_cpp_extension' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'torch_test_cpp_extension'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T07:04:23.0556725Z  Building wheel for torch_test_cpp_extension (setup.py) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | done 2025-09-07T07:04:23.0680737Z [?25h Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=12956955 sha256=34079ecbf23e7eb16217c80361858a698299713010caa6680f7e40759b88a427 2025-09-07T07:04:23.0685108Z Stored in directory: /tmp/pip-ephem-wheel-cache-g3esb2ml/wheels/a9/2e/d7/a9e103243c0b754e2324c4ee6ddd055c388a2eefc520cfc979 2025-09-07T07:04:23.0700949Z Successfully built torch_test_cpp_extension 2025-09-07T07:04:23.4141152Z Installing collected packages: torch_test_cpp_extension 2025-09-07T07:04:23.6480497Z Successfully installed torch_test_cpp_extension-0.0.0 2025-09-07T07:04:24.0311065Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions/no_python_abi_suffix_test 2025-09-07T07:04:25.7285605Z Preparing metadata (setup.py) ... [?25l- done 2025-09-07T07:04:25.7315675Z [?25hBuilding wheels for collected packages: no_python_abi_suffix_test 2025-09-07T07:04:25.7327526Z  DEPRECATION: Building 'no_python_abi_suffix_test' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'no_python_abi_suffix_test'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T07:04:27.7886314Z  Building wheel for no_python_abi_suffix_test (setup.py) ... [?25l- \ | done 2025-09-07T07:04:27.7894523Z [?25h Created wheel for no_python_abi_suffix_test: filename=no_python_abi_suffix_test-0.0.0-cp310-cp310-linux_x86_64.whl size=2944 sha256=1a9ea55ad295cad12150356e3012605a0bb9f407e4b23959be921354087abc5c 2025-09-07T07:04:27.7895905Z Stored in directory: /tmp/pip-ephem-wheel-cache-bmcixdqp/wheels/01/96/31/d3c48c51cc163420d8b3b57e95a07fda055add3ed0ea48001b 2025-09-07T07:04:27.7914539Z Successfully built no_python_abi_suffix_test 2025-09-07T07:04:28.1354323Z Installing collected packages: no_python_abi_suffix_test 2025-09-07T07:04:28.1403414Z Successfully installed no_python_abi_suffix_test-0.0.0 2025-09-07T07:04:28.5154802Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions/python_agnostic_extension 2025-09-07T07:04:30.6711281Z Preparing metadata (setup.py) ... [?25l- done 2025-09-07T07:04:30.6741255Z [?25hBuilding wheels for collected packages: python_agnostic 2025-09-07T07:04:30.6751523Z  DEPRECATION: Building 'python_agnostic' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'python_agnostic'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T07:04:49.7582741Z  Building wheel for python_agnostic (setup.py) ... [?25l- \ | done 2025-09-07T07:04:49.7590450Z [?25h Created wheel for python_agnostic: filename=python_agnostic-0.0-cp39-abi3-linux_x86_64.whl size=21172 sha256=c05998e796138d5782427f21991258dd8b5b3a30d7f0ebf0f13fdb3f1039b304 2025-09-07T07:04:49.7592385Z Stored in directory: /tmp/pip-ephem-wheel-cache-hcsyy7h1/wheels/70/18/03/a6c0c2f80177a127cd534840cb967a6c872dd5a46747d888e8 2025-09-07T07:04:49.7609753Z Successfully built python_agnostic 2025-09-07T07:04:50.1338056Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions/libtorch_agnostic_extension 2025-09-07T07:04:52.5654545Z Preparing metadata (setup.py) ... [?25l- done 2025-09-07T07:04:52.5685250Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from libtorch_agnostic==0.0) (2.9.0a0+git93fb23d) 2025-09-07T07:04:52.5709757Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (3.19.1) 2025-09-07T07:04:52.5714796Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (4.15.0) 2025-09-07T07:04:52.5719049Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (1.13.3) 2025-09-07T07:04:52.5722735Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (2.8.8) 2025-09-07T07:04:52.5726079Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (3.1.6) 2025-09-07T07:04:52.5730462Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (2025.7.0) 2025-09-07T07:04:52.6102836Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->libtorch_agnostic==0.0) (1.3.0) 2025-09-07T07:04:52.6136176Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->libtorch_agnostic==0.0) (3.0.2) 2025-09-07T07:04:52.6145318Z Building wheels for collected packages: libtorch_agnostic 2025-09-07T07:04:52.6155128Z  DEPRECATION: Building 'libtorch_agnostic' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'libtorch_agnostic'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T07:04:56.9627808Z  Building wheel for libtorch_agnostic (setup.py) ... [?25l- \ | done 2025-09-07T07:04:56.9634490Z [?25h Created wheel for libtorch_agnostic: filename=libtorch_agnostic-0.0-cp39-abi3-linux_x86_64.whl size=32524 sha256=e48505989f2aab71759336d9b0fda8557fc2f80e6606d30815c6e22b6fb86b11 2025-09-07T07:04:56.9635589Z Stored in directory: /tmp/pip-ephem-wheel-cache-klkqxhkq/wheels/0d/08/74/4ba0a92b390e7b767925227eeb64822a849cf3565e6a5de83a 2025-09-07T07:04:56.9654543Z Successfully built libtorch_agnostic 2025-09-07T07:04:57.2674623Z Installing collected packages: libtorch_agnostic 2025-09-07T07:04:57.2742860Z Successfully installed libtorch_agnostic-0.0 2025-09-07T07:04:57.3216295Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:04:57.3221615Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_extensions_aot_ninja.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:04:57.321872] 2025-09-07T07:05:00.9919092Z 2025-09-07T07:05:00.9920327Z test_cpp_extensions_aot_ninja 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_extensions_aot_ninja_1.1_6335811979fad441_.log 2025-09-07T07:05:00.9930952Z Running 21 items in this shard: test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_backward, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cublas_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cuda_dlink_libs, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cuda_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cusolver_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_extension_function, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_extension_module, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_mps_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_no_python_abi_suffix_sets_the_correct_library_name, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_optional, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_sycl_extension, test/test_cpp_extensions_aot_ninja.py::TestPybindTypeCasters::test_pybind_return_types, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_add, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_autocast_apis_for_maia_device, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_conv_backend_override, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_matmul_autocast_default_precision, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_matmul_autocast_float16_precision, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_unregistered, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_zeros, test/test_cpp_extensions_aot_ninja.py::TestRNGExtension::test_rng, test/test_cpp_extensions_aot_ninja.py::TestTorchLibrary::test_torch_library 2025-09-07T07:05:00.9937189Z 2025-09-07T07:05:00.9937438Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T07:05:00.9938030Z Uploading artifacts took 0.00 seconds 2025-09-07T07:05:00.9938418Z Running test_cpp_extensions_aot_no_ninja 1/1 ... [2025-09-07 07:05:00.992685] 2025-09-07T07:05:01.3505807Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions 2025-09-07T07:05:05.0654211Z Preparing metadata (setup.py) ... [?25l- \ done 2025-09-07T07:05:05.0682985Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension 2025-09-07T07:05:05.0691540Z  DEPRECATION: Building 'torch_test_cpp_extension' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'torch_test_cpp_extension'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T07:05:14.3079439Z  Building wheel for torch_test_cpp_extension (setup.py) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | done 2025-09-07T07:05:14.3201647Z [?25h Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=12956725 sha256=8b34f53f5a3ddbc6195b0a70938ea879ec856a3680a24f6d9f286d4e99653adf 2025-09-07T07:05:14.3203802Z Stored in directory: /tmp/pip-ephem-wheel-cache-n0ajd75m/wheels/a9/2e/d7/a9e103243c0b754e2324c4ee6ddd055c388a2eefc520cfc979 2025-09-07T07:05:14.3227126Z Successfully built torch_test_cpp_extension 2025-09-07T07:05:14.6693588Z Installing collected packages: torch_test_cpp_extension 2025-09-07T07:05:14.8958356Z Successfully installed torch_test_cpp_extension-0.0.0 2025-09-07T07:05:15.2822088Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions/no_python_abi_suffix_test 2025-09-07T07:05:16.9923210Z Preparing metadata (setup.py) ... [?25l- done 2025-09-07T07:05:16.9952031Z [?25hBuilding wheels for collected packages: no_python_abi_suffix_test 2025-09-07T07:05:16.9963672Z  DEPRECATION: Building 'no_python_abi_suffix_test' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'no_python_abi_suffix_test'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T07:05:18.9723201Z  Building wheel for no_python_abi_suffix_test (setup.py) ... [?25l- \ | done 2025-09-07T07:05:18.9731010Z [?25h Created wheel for no_python_abi_suffix_test: filename=no_python_abi_suffix_test-0.0.0-cp310-cp310-linux_x86_64.whl size=2944 sha256=6fe3babb249aad373135a0769de2e6665e072c03b36ec1c92e102f0ead17acfc 2025-09-07T07:05:18.9733380Z Stored in directory: /tmp/pip-ephem-wheel-cache-805goimn/wheels/01/96/31/d3c48c51cc163420d8b3b57e95a07fda055add3ed0ea48001b 2025-09-07T07:05:18.9750888Z Successfully built no_python_abi_suffix_test 2025-09-07T07:05:19.3230702Z Installing collected packages: no_python_abi_suffix_test 2025-09-07T07:05:19.3276582Z Successfully installed no_python_abi_suffix_test-0.0.0 2025-09-07T07:05:19.7065403Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions/python_agnostic_extension 2025-09-07T07:05:21.8698363Z Preparing metadata (setup.py) ... [?25l- done 2025-09-07T07:05:21.8729292Z [?25hBuilding wheels for collected packages: python_agnostic 2025-09-07T07:05:21.8739850Z  DEPRECATION: Building 'python_agnostic' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'python_agnostic'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T07:05:24.5985516Z  Building wheel for python_agnostic (setup.py) ... [?25l- \ | done 2025-09-07T07:05:24.5994547Z [?25h Created wheel for python_agnostic: filename=python_agnostic-0.0-cp39-abi3-linux_x86_64.whl size=21172 sha256=3e8834d2668ed2a773342a74f3c1a9a8da3da78b81c90ee67cbd230fa708a9d2 2025-09-07T07:05:24.5996420Z Stored in directory: /tmp/pip-ephem-wheel-cache-bjnwpv7w/wheels/70/18/03/a6c0c2f80177a127cd534840cb967a6c872dd5a46747d888e8 2025-09-07T07:05:24.6021194Z Successfully built python_agnostic 2025-09-07T07:05:24.9803435Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions/libtorch_agnostic_extension 2025-09-07T07:05:27.4134390Z Preparing metadata (setup.py) ... [?25l- done 2025-09-07T07:05:27.4165934Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from libtorch_agnostic==0.0) (2.9.0a0+git93fb23d) 2025-09-07T07:05:27.4192249Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (3.19.1) 2025-09-07T07:05:27.4197118Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (4.15.0) 2025-09-07T07:05:27.4201214Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (1.13.3) 2025-09-07T07:05:27.4205505Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (2.8.8) 2025-09-07T07:05:27.4208947Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (3.1.6) 2025-09-07T07:05:27.4213318Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic==0.0) (2025.7.0) 2025-09-07T07:05:27.4593446Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->libtorch_agnostic==0.0) (1.3.0) 2025-09-07T07:05:27.4627884Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->libtorch_agnostic==0.0) (3.0.2) 2025-09-07T07:05:27.4636830Z Building wheels for collected packages: libtorch_agnostic 2025-09-07T07:05:27.4646660Z  DEPRECATION: Building 'libtorch_agnostic' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'libtorch_agnostic'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T07:05:30.4146997Z  Building wheel for libtorch_agnostic (setup.py) ... [?25l- \ | done 2025-09-07T07:05:30.4154156Z [?25h Created wheel for libtorch_agnostic: filename=libtorch_agnostic-0.0-cp39-abi3-linux_x86_64.whl size=32524 sha256=e717b197d031752d4820d6e3d9b3dc1de8a7899659982abc48bf4b9fdabbeeee 2025-09-07T07:05:30.4155999Z Stored in directory: /tmp/pip-ephem-wheel-cache-kp_ue_0e/wheels/0d/08/74/4ba0a92b390e7b767925227eeb64822a849cf3565e6a5de83a 2025-09-07T07:05:30.4181161Z Successfully built libtorch_agnostic 2025-09-07T07:05:30.7307090Z Installing collected packages: libtorch_agnostic 2025-09-07T07:05:30.7375204Z Successfully installed libtorch_agnostic-0.0 2025-09-07T07:05:30.7901651Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:05:30.7906153Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_extensions_aot_no_ninja.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:05:30.790353] 2025-09-07T07:05:34.4602497Z 2025-09-07T07:05:34.4603590Z test_cpp_extensions_aot_no_ninja 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_extensions_aot_no_ninja_1.1_4abee5ca3de63b22_.log 2025-09-07T07:05:34.4615094Z Running 21 items in this shard: test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_backward, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cublas_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cuda_dlink_libs, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cuda_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cusolver_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_extension_function, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_extension_module, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_mps_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_no_python_abi_suffix_sets_the_correct_library_name, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_optional, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_sycl_extension, test/test_cpp_extensions_aot_no_ninja.py::TestPybindTypeCasters::test_pybind_return_types, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_add, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_autocast_apis_for_maia_device, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_conv_backend_override, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_matmul_autocast_default_precision, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_matmul_autocast_float16_precision, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_unregistered, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_zeros, test/test_cpp_extensions_aot_no_ninja.py::TestRNGExtension::test_rng, test/test_cpp_extensions_aot_no_ninja.py::TestTorchLibrary::test_torch_library 2025-09-07T07:05:34.4621869Z 2025-09-07T07:05:34.4622074Z Running inductor/test_aot_inductor 1/1 ... [2025-09-07 07:05:34.460896] 2025-09-07T07:05:34.4622453Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:05:34.4623484Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:05:34.461222] 2025-09-07T07:05:41.6862498Z 2025-09-07T07:05:41.6863905Z inductor/test_aot_inductor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_1.1_89cc43dc1dc838e4_.log 2025-09-07T07:05:41.6865106Z Running 0 items in this shard: 2025-09-07T07:05:41.6865396Z 2025-09-07T07:05:41.6867904Z Running inductor/test_triton_extension_backend 1/1 ... [2025-09-07 07:05:41.686592] 2025-09-07T07:05:41.6868678Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:05:41.6871298Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_extension_backend.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:05:41.686935] 2025-09-07T07:05:48.7114571Z 2025-09-07T07:05:48.7115953Z inductor/test_triton_extension_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_extension_backend_1.1_9d973338ac1d372c_.log 2025-09-07T07:05:48.7117273Z Running 0 items in this shard: 2025-09-07T07:05:48.7117553Z 2025-09-07T07:05:48.7118407Z Running inductor/test_compiled_autograd 2/2 ... [2025-09-07 07:05:48.711654] 2025-09-07T07:05:48.7119074Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:05:48.7122700Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_autograd.py', '-m', 'serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:05:48.711988] 2025-09-07T07:05:57.2889806Z 2025-09-07T07:05:57.2891733Z inductor/test_compiled_autograd 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_autograd_2.2_cf71fddcab53705d_.log 2025-09-07T07:05:57.2893261Z Running 0 items in this shard: 2025-09-07T07:05:57.2893567Z 2025-09-07T07:05:57.2894216Z Running test_comparison_utils 1/1 ... [2025-09-07 07:05:57.289216] 2025-09-07T07:05:57.2894859Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:05:57.2897693Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_comparison_utils.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:05:57.289593] 2025-09-07T07:06:00.5091817Z 2025-09-07T07:06:00.5092872Z test_comparison_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_comparison_utils_1.1_0165be94c99f271e_.log 2025-09-07T07:06:00.5094112Z Running 0 items in this shard: 2025-09-07T07:06:00.5094411Z 2025-09-07T07:06:00.5096267Z Running inductor/test_provenance_tracing 1/1 ... [2025-09-07 07:06:00.509380] 2025-09-07T07:06:00.5097125Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:00.5099654Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_provenance_tracing.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:00.509716] 2025-09-07T07:06:07.0835663Z 2025-09-07T07:06:07.0836866Z inductor/test_provenance_tracing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_provenance_tracing_1.1_79d55ccb004e8f41_.log 2025-09-07T07:06:07.0838160Z Running 0 items in this shard: 2025-09-07T07:06:07.0838447Z 2025-09-07T07:06:07.0842487Z Running export/test_functionalized_assertions 1/1 ... [2025-09-07 07:06:07.084006] 2025-09-07T07:06:07.0843372Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:07.0847375Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_functionalized_assertions.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:07.084510] 2025-09-07T07:06:10.3038764Z 2025-09-07T07:06:10.3040027Z export/test_functionalized_assertions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_functionalized_assertions_1.1_645550099967c686_.log 2025-09-07T07:06:10.3041370Z Running 0 items in this shard: 2025-09-07T07:06:10.3041659Z 2025-09-07T07:06:10.3045374Z Running test_license 1/1 ... [2025-09-07 07:06:10.304348] 2025-09-07T07:06:10.3045941Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:10.3049661Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_license.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:10.304744] 2025-09-07T07:06:13.5240452Z 2025-09-07T07:06:13.5241235Z test_license 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_license_1.1_99a0a156724fe649_.log 2025-09-07T07:06:13.5242376Z Running 0 items in this shard: 2025-09-07T07:06:13.5242717Z 2025-09-07T07:06:13.5247670Z Running dynamo/test_base_output 1/1 ... [2025-09-07 07:06:13.524570] 2025-09-07T07:06:13.5248294Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:13.5251571Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_base_output.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:13.524907] 2025-09-07T07:06:16.9944506Z 2025-09-07T07:06:16.9945730Z dynamo/test_base_output 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_base_output_1.1_ae2f3f84661adc28_.log 2025-09-07T07:06:16.9947425Z Running 0 items in this shard: 2025-09-07T07:06:16.9947944Z 2025-09-07T07:06:16.9953857Z Running inductor/test_triton_kernels 1/1 ... [2025-09-07 07:06:16.994867] 2025-09-07T07:06:16.9954460Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:16.9955797Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_kernels.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:16.995192] 2025-09-07T07:06:23.8194616Z 2025-09-07T07:06:23.8195721Z inductor/test_triton_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_kernels_1.1_2a630e1859578ab1_.log 2025-09-07T07:06:23.8197063Z Running 0 items in this shard: 2025-09-07T07:06:23.8197410Z 2025-09-07T07:06:23.8200421Z Running test_mkldnn_verbose 1/1 ... [2025-09-07 07:06:23.819802] 2025-09-07T07:06:23.8200964Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:23.8203425Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkldnn_verbose.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:23.820136] 2025-09-07T07:06:27.0395036Z 2025-09-07T07:06:27.0396067Z test_mkldnn_verbose 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkldnn_verbose_1.1_ebf794114bfd6857_.log 2025-09-07T07:06:27.0397199Z Running 0 items in this shard: 2025-09-07T07:06:27.0397488Z 2025-09-07T07:06:27.0400662Z Running inductor/test_inductor_utils 1/1 ... [2025-09-07 07:06:27.039902] 2025-09-07T07:06:27.0401150Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:27.0404918Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inductor_utils.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:27.040232] 2025-09-07T07:06:30.2594349Z 2025-09-07T07:06:30.2595597Z inductor/test_inductor_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inductor_utils_1.1_7c06cf34bb3072ba_.log 2025-09-07T07:06:30.2596817Z Running 0 items in this shard: 2025-09-07T07:06:30.2597100Z 2025-09-07T07:06:30.2601474Z Running inductor/test_flex_decoding 1/1 ... [2025-09-07 07:06:30.259880] 2025-09-07T07:06:30.2602252Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:30.2606385Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_decoding.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:30.260365] 2025-09-07T07:06:37.3848681Z 2025-09-07T07:06:37.3849968Z inductor/test_flex_decoding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_decoding_1.1_a4dcd45ab6fa1447_.log 2025-09-07T07:06:37.3853786Z Running 4 items in this shard: test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_pow_2_headdim_head_dim_121_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_pow_2_headdim_head_dim_17_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_pow_2_headdim_head_dim_24_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_pow_2_headdim_head_dim_94_float16_cuda_float16 2025-09-07T07:06:37.3857515Z 2025-09-07T07:06:37.3857914Z Running cpp_extensions/torch_stable_test_extension/torch_stable_test/test_torch_stable 1/1 ... [2025-09-07 07:06:37.385266] 2025-09-07T07:06:37.3858466Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:37.3860206Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'cpp_extensions/torch_stable_test_extension/torch_stable_test/test_torch_stable.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:37.385810] 2025-09-07T07:06:40.6052986Z 2025-09-07T07:06:40.6055054Z cpp_extensions/torch_stable_test_extension/torch_stable_test/test_torch_stable 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp_extensions.torch_stable_test_extension.torch_stable_test.test_torch_stable_1.1_0af000d3d4719b2e_.log 2025-09-07T07:06:40.6057173Z Running 0 items in this shard: 2025-09-07T07:06:40.6057530Z 2025-09-07T07:06:40.6058630Z Running inductor/test_analysis 1/1 ... [2025-09-07 07:06:40.605642] 2025-09-07T07:06:40.6059254Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:40.6062184Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_analysis.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:40.605964] 2025-09-07T07:06:47.7306561Z 2025-09-07T07:06:47.7307428Z inductor/test_analysis 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_analysis_1.1_bbb1c9ea121d65c3_.log 2025-09-07T07:06:47.7308283Z Running 0 items in this shard: 2025-09-07T07:06:47.7308500Z 2025-09-07T07:06:47.7313103Z Running test_rename_privateuse1_to_existing_device 1/1 ... [2025-09-07 07:06:47.731121] 2025-09-07T07:06:47.7313663Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:47.7318327Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_rename_privateuse1_to_existing_device.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:47.731626] 2025-09-07T07:06:51.0013990Z 2025-09-07T07:06:51.0015557Z test_rename_privateuse1_to_existing_device 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_rename_privateuse1_to_existing_device_1.1_8df9e6b175bb269c_.log 2025-09-07T07:06:51.0016909Z Running 0 items in this shard: 2025-09-07T07:06:51.0017181Z 2025-09-07T07:06:51.0019088Z Running inductor/test_cutedsl_template 1/1 ... [2025-09-07 07:06:51.001695] 2025-09-07T07:06:51.0019766Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:51.0022466Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutedsl_template.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:51.002032] 2025-09-07T07:06:57.6761522Z 2025-09-07T07:06:57.6762780Z inductor/test_cutedsl_template 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutedsl_template_1.1_744f22efea4b8462_.log 2025-09-07T07:06:57.6764041Z Running 0 items in this shard: 2025-09-07T07:06:57.6764350Z 2025-09-07T07:06:57.6765463Z Running inductor/test_ck_backend 1/1 ... [2025-09-07 07:06:57.676324] 2025-09-07T07:06:57.6766097Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:06:57.6769690Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_ck_backend.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:06:57.676661] 2025-09-07T07:07:04.4508361Z 2025-09-07T07:07:04.4509422Z inductor/test_ck_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_ck_backend_1.1_422dd2c290b84c44_.log 2025-09-07T07:07:04.4510563Z Running 0 items in this shard: 2025-09-07T07:07:04.4510845Z 2025-09-07T07:07:04.4513954Z Running inductor/test_memory_planning 1/1 ... [2025-09-07 07:07:04.451172] 2025-09-07T07:07:04.4515148Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:07:04.4518121Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_memory_planning.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:07:04.451524] 2025-09-07T07:07:11.5762159Z 2025-09-07T07:07:11.5763469Z inductor/test_memory_planning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_memory_planning_1.1_68142b76382a4e67_.log 2025-09-07T07:07:11.5764763Z Running 0 items in this shard: 2025-09-07T07:07:11.5765050Z 2025-09-07T07:07:11.5768070Z Running export/test_export_with_inline_and_install 1/1 ... [2025-09-07 07:07:11.576570] 2025-09-07T07:07:11.5768799Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:07:11.5772226Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_export_with_inline_and_install.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:07:11.576909] 2025-09-07T07:07:18.7014037Z 2025-09-07T07:07:18.7015509Z export/test_export_with_inline_and_install 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_export_with_inline_and_install_1.1_3028607793aacc45_.log 2025-09-07T07:07:18.7017097Z Running 0 items in this shard: 2025-09-07T07:07:18.7017431Z 2025-09-07T07:07:18.7020640Z Running dynamo/test_skip_guard_eval_unsafe 1/1 ... [2025-09-07 07:07:18.701827] 2025-09-07T07:07:18.7021158Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:07:18.7024688Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_skip_guard_eval_unsafe.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:07:18.702207] 2025-09-07T07:07:22.1219045Z 2025-09-07T07:07:22.1220404Z dynamo/test_skip_guard_eval_unsafe 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_skip_guard_eval_unsafe_1.1_05346d902d6230e3_.log 2025-09-07T07:07:22.1221911Z Running 0 items in this shard: 2025-09-07T07:07:22.1222253Z 2025-09-07T07:07:22.1224347Z Running inductor/test_inplace_padding 1/1 ... [2025-09-07 07:07:22.122240] 2025-09-07T07:07:22.1225015Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:07:22.1228446Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inplace_padding.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:07:22.122558] 2025-09-07T07:07:29.1469239Z 2025-09-07T07:07:29.1470593Z inductor/test_inplace_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inplace_padding_1.1_b7ac66ac06c97074_.log 2025-09-07T07:07:29.1472261Z Running 1 items in this shard: test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel 2025-09-07T07:07:29.1473118Z 2025-09-07T07:07:29.1474038Z Running dynamo/test_buffers_override 1/1 ... [2025-09-07 07:07:29.147156] 2025-09-07T07:07:29.1474815Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:07:29.1477631Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_buffers_override.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:07:29.147520] 2025-09-07T07:07:32.3667378Z 2025-09-07T07:07:32.3668548Z dynamo/test_buffers_override 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_buffers_override_1.1_0bd00cb2857ce019_.log 2025-09-07T07:07:32.3669825Z Running 0 items in this shard: 2025-09-07T07:07:32.3670104Z 2025-09-07T07:07:32.3671254Z Running test_custom_ops 1/1 ... [2025-09-07 07:07:32.366929] 2025-09-07T07:07:32.3671881Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:07:32.3675348Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_custom_ops.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:07:32.367245] 2025-09-07T07:07:37.0884424Z 2025-09-07T07:07:37.0885434Z test_custom_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_custom_ops_1.1_575cff2153e269e5_.log 2025-09-07T07:07:37.0886509Z Running 0 items in this shard: 2025-09-07T07:07:37.0886793Z 2025-09-07T07:07:37.0889136Z Running inductor/test_b2b_gemm 1/1 ... [2025-09-07 07:07:37.088684] 2025-09-07T07:07:37.0889765Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:07:37.0892974Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_b2b_gemm.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:07:37.089015] 2025-09-07T07:07:43.7631748Z 2025-09-07T07:07:43.7632685Z inductor/test_b2b_gemm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_b2b_gemm_1.1_0a0af703655e73c9_.log 2025-09-07T07:07:43.7633805Z Running 0 items in this shard: 2025-09-07T07:07:43.7634089Z 2025-09-07T07:07:43.7636230Z Running functorch/test_ac_logging 1/1 ... [2025-09-07 07:07:43.763381] 2025-09-07T07:07:43.7636958Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:07:43.7640360Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ac_logging.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:07:43.763732] 2025-09-07T07:07:46.9830180Z 2025-09-07T07:07:46.9831497Z functorch/test_ac_logging 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ac_logging_1.1_4db0d6f38be971a7_.log 2025-09-07T07:07:46.9832894Z Running 0 items in this shard: 2025-09-07T07:07:46.9833231Z 2025-09-07T07:07:46.9836862Z Running inductor/test_inductor_annotations 1/1 ... [2025-09-07 07:07:46.983482] 2025-09-07T07:07:46.9837579Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:07:46.9840593Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inductor_annotations.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:07:46.983810] 2025-09-07T07:07:54.2088103Z 2025-09-07T07:07:54.2089288Z inductor/test_inductor_annotations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inductor_annotations_1.1_5184ffffbcd4032e_.log 2025-09-07T07:07:54.2090567Z Running 0 items in this shard: 2025-09-07T07:07:54.2090848Z 2025-09-07T07:07:54.2094194Z Running dynamo/test_resume 1/1 ... [2025-09-07 07:07:54.209195] 2025-09-07T07:07:54.2094790Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:07:54.2098270Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_resume.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:07:54.209543] 2025-09-07T07:07:57.4288224Z 2025-09-07T07:07:57.4289308Z dynamo/test_resume 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_resume_1.1_9be1cba6aa6d6cd3_.log 2025-09-07T07:07:57.4290434Z Running 0 items in this shard: 2025-09-07T07:07:57.4290727Z 2025-09-07T07:07:57.4295601Z Running inductor/test_template_heuristics_registry 1/1 ... [2025-09-07 07:07:57.429303] 2025-09-07T07:07:57.4297016Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:07:57.4299379Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_template_heuristics_registry.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:07:57.429640] 2025-09-07T07:08:01.9509867Z 2025-09-07T07:08:01.9511416Z inductor/test_template_heuristics_registry 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_template_heuristics_registry_1.1_cc936613c730fb17_.log 2025-09-07T07:08:01.9512868Z Running 0 items in this shard: 2025-09-07T07:08:01.9513153Z 2025-09-07T07:08:01.9513505Z Running inductor/test_debug_trace 1/1 ... [2025-09-07 07:08:01.951165] 2025-09-07T07:08:01.9514141Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:01.9516611Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_debug_trace.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:01.951494] 2025-09-07T07:08:09.0259550Z 2025-09-07T07:08:09.0260763Z inductor/test_debug_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_debug_trace_1.1_56317c37348f2107_.log 2025-09-07T07:08:09.0261942Z Running 0 items in this shard: 2025-09-07T07:08:09.0262258Z 2025-09-07T07:08:09.0265090Z Running test_ao_sparsity 1/1 ... [2025-09-07 07:08:09.026286] 2025-09-07T07:08:09.0265677Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:09.0269953Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ao_sparsity.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:09.026621] 2025-09-07T07:08:12.7467441Z 2025-09-07T07:08:12.7468626Z test_ao_sparsity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_ao_sparsity_1.1_58e0d415b8564a9a_.log 2025-09-07T07:08:12.7481095Z Running 0 items in this shard: 2025-09-07T07:08:12.7481358Z 2025-09-07T07:08:12.7481589Z Running inductor/test_async_compile 1/1 ... [2025-09-07 07:08:12.747210] 2025-09-07T07:08:12.7481991Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:12.7482964Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_async_compile.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:12.747530] 2025-09-07T07:08:19.4215045Z 2025-09-07T07:08:19.4216312Z inductor/test_async_compile 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_async_compile_1.1_4ea0631df2ec9111_.log 2025-09-07T07:08:19.4217565Z Running 0 items in this shard: 2025-09-07T07:08:19.4217856Z 2025-09-07T07:08:19.4226356Z Running dynamo/test_nops 1/1 ... [2025-09-07 07:08:19.421923] 2025-09-07T07:08:19.4227488Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:19.4229237Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_nops.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:19.422261] 2025-09-07T07:08:22.8917654Z 2025-09-07T07:08:22.8918651Z dynamo/test_nops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_nops_1.1_9d42ebefe151db51_.log 2025-09-07T07:08:22.8919833Z Running 0 items in this shard: 2025-09-07T07:08:22.8920205Z 2025-09-07T07:08:22.8923887Z Running torch_np/test_nep50_examples 1/1 ... [2025-09-07 07:08:22.892152] 2025-09-07T07:08:22.8924398Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:22.8927100Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_nep50_examples.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:22.892459] 2025-09-07T07:08:26.4120791Z 2025-09-07T07:08:26.4121952Z torch_np/test_nep50_examples 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_nep50_examples_1.1_73954e8d53d5dd1b_.log 2025-09-07T07:08:26.4123171Z Running 0 items in this shard: 2025-09-07T07:08:26.4123449Z 2025-09-07T07:08:26.4123945Z Running torch_np/test_binary_ufuncs 1/1 ... [2025-09-07 07:08:26.412189] 2025-09-07T07:08:26.4124619Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:26.4128689Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_binary_ufuncs.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:26.412516] 2025-09-07T07:08:29.6819334Z 2025-09-07T07:08:29.6820652Z torch_np/test_binary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_binary_ufuncs_1.1_d3688003d9b936af_.log 2025-09-07T07:08:29.6822053Z Running 0 items in this shard: 2025-09-07T07:08:29.6822349Z 2025-09-07T07:08:29.6825332Z Running inductor/test_best_config 1/1 ... [2025-09-07 07:08:29.682352] 2025-09-07T07:08:29.6826011Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:29.6829196Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_best_config.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:29.682672] 2025-09-07T07:08:36.4569597Z 2025-09-07T07:08:36.4571426Z inductor/test_best_config 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_best_config_1.1_70a7f3ad2972a9c6_.log 2025-09-07T07:08:36.4572653Z Running 0 items in this shard: 2025-09-07T07:08:36.4572944Z 2025-09-07T07:08:36.4580642Z Running test_hop_infra 1/1 ... [2025-09-07 07:08:36.457266] 2025-09-07T07:08:36.4581065Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:36.4582000Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_hop_infra.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:36.457631] 2025-09-07T07:08:40.5782675Z 2025-09-07T07:08:40.5783675Z test_hop_infra 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_hop_infra_1.1_add81ad5cbd21120_.log 2025-09-07T07:08:40.5784870Z Running 0 items in this shard: 2025-09-07T07:08:40.5785246Z 2025-09-07T07:08:40.5790055Z Running torch_np/test_unary_ufuncs 1/1 ... [2025-09-07 07:08:40.578752] 2025-09-07T07:08:40.5790743Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:40.5793714Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_unary_ufuncs.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:40.579124] 2025-09-07T07:08:43.8485045Z 2025-09-07T07:08:43.8486053Z torch_np/test_unary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_unary_ufuncs_1.1_bdb00697afab423f_.log 2025-09-07T07:08:43.8487917Z Running 0 items in this shard: 2025-09-07T07:08:43.8488245Z 2025-09-07T07:08:43.8491823Z Running inductor/test_aot_inductor_package 1/1 ... [2025-09-07 07:08:43.848952] 2025-09-07T07:08:43.8492531Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:43.8495751Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_package.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:43.849291] 2025-09-07T07:08:50.4733388Z 2025-09-07T07:08:50.4734913Z inductor/test_aot_inductor_package 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_package_1.1_9a07485b2e98b678_.log 2025-09-07T07:08:50.4736200Z Running 0 items in this shard: 2025-09-07T07:08:50.4736519Z 2025-09-07T07:08:50.4739174Z Running inductor/test_pad_mm 1/1 ... [2025-09-07 07:08:50.473701] 2025-09-07T07:08:50.4739796Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:50.4743350Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_pad_mm.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:50.474039] 2025-09-07T07:08:57.1981397Z 2025-09-07T07:08:57.1982492Z inductor/test_pad_mm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_pad_mm_1.1_8c35c0f7bfdc3808_.log 2025-09-07T07:08:57.1983642Z Running 0 items in this shard: 2025-09-07T07:08:57.1983934Z 2025-09-07T07:08:57.1986144Z Running typing/test_python_operators 1/1 ... [2025-09-07 07:08:57.198337] 2025-09-07T07:08:57.1986895Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:08:57.1989448Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'typing/test_python_operators.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:08:57.198676] 2025-09-07T07:09:00.4680324Z 2025-09-07T07:09:00.4682155Z typing/test_python_operators 1/1 was successful, full logs can be found in artifacts with path test/test-reports/typing.test_python_operators_1.1_6dd60956b98e5cc1_.log 2025-09-07T07:09:00.4683431Z Running 0 items in this shard: 2025-09-07T07:09:00.4683723Z 2025-09-07T07:09:00.4687283Z Running inductor/test_aot_inductor_custom_ops 1/1 ... [2025-09-07 07:09:00.468504] 2025-09-07T07:09:00.4688013Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:09:00.4691570Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_custom_ops.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:09:00.468847] 2025-09-07T07:09:07.5933675Z 2025-09-07T07:09:07.5935303Z inductor/test_aot_inductor_custom_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_custom_ops_1.1_86efdfd888617911_.log 2025-09-07T07:09:07.5936867Z Running 0 items in this shard: 2025-09-07T07:09:07.5937243Z 2025-09-07T07:09:07.5940797Z Running inductor/test_cudagraph_trees 1/1 ... [2025-09-07 07:09:07.593818] 2025-09-07T07:09:07.5941501Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:09:07.5944556Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cudagraph_trees.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:09:07.594203] 2025-09-07T07:09:14.3184754Z 2025-09-07T07:09:14.3185444Z inductor/test_cudagraph_trees 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cudagraph_trees_1.1_fb2a36b662912fc6_.log 2025-09-07T07:09:14.3186456Z Running 0 items in this shard: 2025-09-07T07:09:14.3186634Z 2025-09-07T07:09:14.3190388Z Running inductor/test_compile_worker 1/1 ... [2025-09-07 07:09:14.318840] 2025-09-07T07:09:14.3190801Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:09:14.3193909Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile_worker.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:09:14.319173] 2025-09-07T07:09:21.0433598Z 2025-09-07T07:09:21.0434846Z inductor/test_compile_worker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_worker_1.1_0c97c7b1843782bf_.log 2025-09-07T07:09:21.0436074Z Running 0 items in this shard: 2025-09-07T07:09:21.0436411Z 2025-09-07T07:09:21.0437418Z Running dynamo/test_modules 1/1 ... [2025-09-07 07:09:21.043529] 2025-09-07T07:09:21.0438037Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:09:21.0441992Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_modules.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:09:21.043899] 2025-09-07T07:09:28.1184484Z 2025-09-07T07:09:28.1185837Z dynamo/test_modules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_modules_1.1_1522b8ed4b0264f0_.log 2025-09-07T07:09:28.1187160Z Running 0 items in this shard: 2025-09-07T07:09:28.1187496Z 2025-09-07T07:09:28.1191011Z Running test_transformers 1/1 ... [2025-09-07 07:09:28.118884] 2025-09-07T07:09:28.1191612Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:09:28.1194712Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_transformers.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:09:28.119225] 2025-09-07T07:09:35.5943832Z 2025-09-07T07:09:35.5945517Z test_transformers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_transformers_1.1_06cffe2643579102_.log 2025-09-07T07:09:35.5946818Z Running 0 items in this shard: 2025-09-07T07:09:35.5947174Z 2025-09-07T07:09:35.5950282Z Running dynamo/test_global 1/1 ... [2025-09-07 07:09:35.594840] 2025-09-07T07:09:35.5950877Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:09:35.5954448Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_global.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:09:35.595214] 2025-09-07T07:09:39.0147482Z 2025-09-07T07:09:39.0148551Z dynamo/test_global 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_global_1.1_e8ff3e8a2a8e3dcb_.log 2025-09-07T07:09:39.0149635Z Running 0 items in this shard: 2025-09-07T07:09:39.0149931Z 2025-09-07T07:09:39.0153580Z Running export/test_export 1/1 ... [2025-09-07 07:09:39.015173] 2025-09-07T07:09:39.0154246Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:09:39.0157491Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_export.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:09:39.015497] 2025-09-07T07:09:46.0401384Z 2025-09-07T07:09:46.0402482Z export/test_export 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_export_1.1_b652060c8356ff9c_.log 2025-09-07T07:09:46.0404545Z Running 0 items in this shard: 2025-09-07T07:09:46.0404896Z 2025-09-07T07:09:46.0408275Z Running test_foreach 1/1 ... [2025-09-07 07:09:46.040593] 2025-09-07T07:09:46.0408838Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:09:46.0412621Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_foreach.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:09:46.040978] 2025-09-07T07:09:54.6678374Z 2025-09-07T07:09:54.6679298Z test_foreach 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_foreach_1.1_4d93a6e6a53d4aaf_.log 2025-09-07T07:09:54.6680285Z Running 0 items in this shard: 2025-09-07T07:09:54.6680561Z 2025-09-07T07:09:54.6683043Z Running test_appending_byte_serializer 1/1 ... [2025-09-07 07:09:54.668055] 2025-09-07T07:09:54.6683723Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:09:54.6686994Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_appending_byte_serializer.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:09:54.668404] 2025-09-07T07:09:57.8878269Z 2025-09-07T07:09:57.8879775Z test_appending_byte_serializer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_appending_byte_serializer_1.1_0577e93bec70e6ec_.log 2025-09-07T07:09:57.8880989Z Running 0 items in this shard: 2025-09-07T07:09:57.8881269Z 2025-09-07T07:09:57.8882485Z Running test_fx_experimental 1/1 ... [2025-09-07 07:09:57.888070] 2025-09-07T07:09:57.8883093Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:09:57.8886668Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx_experimental.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:09:57.888406] 2025-09-07T07:10:03.2107249Z 2025-09-07T07:10:03.2108159Z test_fx_experimental 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_experimental_1.1_36044bdebd69d351_.log 2025-09-07T07:10:03.2109458Z Running 0 items in this shard: 2025-09-07T07:10:03.2109748Z 2025-09-07T07:10:03.2112252Z Running inductor/test_triton_wrapper 1/1 ... [2025-09-07 07:10:03.210994] 2025-09-07T07:10:03.2112944Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:03.2115768Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_wrapper.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:03.211325] 2025-09-07T07:10:09.9855294Z 2025-09-07T07:10:09.9856684Z inductor/test_triton_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_wrapper_1.1_d45a80f6e149cf2e_.log 2025-09-07T07:10:09.9858218Z Running 0 items in this shard: 2025-09-07T07:10:09.9858566Z 2025-09-07T07:10:09.9862175Z Running inductor/test_torchinductor_strided_blocks 1/1 ... [2025-09-07 07:10:09.985978] 2025-09-07T07:10:09.9862929Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:09.9865966Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_strided_blocks.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:09.986334] 2025-09-07T07:10:17.1109938Z 2025-09-07T07:10:17.1111310Z inductor/test_torchinductor_strided_blocks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_strided_blocks_1.1_1c1e6c36cfe52c56_.log 2025-09-07T07:10:17.1112701Z Running 0 items in this shard: 2025-09-07T07:10:17.1113606Z 2025-09-07T07:10:17.1113907Z Running test_file_check 1/1 ... [2025-09-07 07:10:17.111194] 2025-09-07T07:10:17.1114496Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:17.1117975Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_file_check.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:17.111521] 2025-09-07T07:10:20.3309311Z 2025-09-07T07:10:20.3310135Z test_file_check 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_file_check_1.1_ebc21c556f7da9d5_.log 2025-09-07T07:10:20.3310903Z Running 0 items in this shard: 2025-09-07T07:10:20.3311118Z 2025-09-07T07:10:20.3319008Z Running dynamo/test_interop 1/1 ... [2025-09-07 07:10:20.331306] 2025-09-07T07:10:20.3319720Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:20.3321318Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_interop.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:20.331664] 2025-09-07T07:10:23.7513358Z 2025-09-07T07:10:23.7514349Z dynamo/test_interop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_interop_1.1_cf51ef1949abad71_.log 2025-09-07T07:10:23.7515486Z Running 0 items in this shard: 2025-09-07T07:10:23.7515801Z 2025-09-07T07:10:23.7518694Z Running dynamo/test_metrics_context 1/1 ... [2025-09-07 07:10:23.751697] 2025-09-07T07:10:23.7519196Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:23.7522138Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_metrics_context.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:23.752009] 2025-09-07T07:10:27.0713326Z 2025-09-07T07:10:27.0714546Z dynamo/test_metrics_context 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_metrics_context_1.1_bc05d21a1f3dd321_.log 2025-09-07T07:10:27.0715755Z Running 0 items in this shard: 2025-09-07T07:10:27.0716041Z 2025-09-07T07:10:27.0716951Z Running test_functionalization 1/1 ... [2025-09-07 07:10:27.071446] 2025-09-07T07:10:27.0717675Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:27.0720058Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_functionalization.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:27.071747] 2025-09-07T07:10:30.3910924Z 2025-09-07T07:10:30.3912342Z test_functionalization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functionalization_1.1_0c9bb4e4ce0888d5_.log 2025-09-07T07:10:30.3913794Z Running 0 items in this shard: 2025-09-07T07:10:30.3914142Z 2025-09-07T07:10:30.3917557Z Running dynamo/test_inline_and_install 1/1 ... [2025-09-07 07:10:30.391568] 2025-09-07T07:10:30.3918064Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:30.3921105Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_inline_and_install.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:30.391868] 2025-09-07T07:10:34.6624088Z 2025-09-07T07:10:34.6625425Z dynamo/test_inline_and_install 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_inline_and_install_1.1_977062bee3b9b13f_.log 2025-09-07T07:10:34.6626771Z Running 0 items in this shard: 2025-09-07T07:10:34.6627012Z 2025-09-07T07:10:34.6629464Z Running inductor/test_smoke 1/1 ... [2025-09-07 07:10:34.662788] 2025-09-07T07:10:34.6630304Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:34.6633077Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_smoke.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:34.663080] 2025-09-07T07:10:41.2867594Z 2025-09-07T07:10:41.2869247Z inductor/test_smoke 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_smoke_1.1_de34c7803d9ac05c_.log 2025-09-07T07:10:41.2870472Z 2025-09-07T07:10:41.2877074Z Running torch_np/test_ufuncs_basic 1/1 ... [2025-09-07 07:10:41.287034] 2025-09-07T07:10:41.2877747Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:41.2879464Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_ufuncs_basic.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:41.287352] 2025-09-07T07:10:44.5564979Z 2025-09-07T07:10:44.5566052Z torch_np/test_ufuncs_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_ufuncs_basic_1.1_6d1464327c4ad734_.log 2025-09-07T07:10:44.5567229Z Running 0 items in this shard: 2025-09-07T07:10:44.5567511Z 2025-09-07T07:10:44.5571983Z Running test_proxy_tensor 1/1 ... [2025-09-07 07:10:44.556944] 2025-09-07T07:10:44.5572703Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:44.5576439Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_proxy_tensor.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:44.557469] 2025-09-07T07:10:49.7793475Z 2025-09-07T07:10:49.7794585Z test_proxy_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_proxy_tensor_1.1_acab9780c788848f_.log 2025-09-07T07:10:49.7795747Z Running 0 items in this shard: 2025-09-07T07:10:49.7796043Z 2025-09-07T07:10:49.7798225Z Running inductor/test_fx_fusion 1/1 ... [2025-09-07 07:10:49.779611] 2025-09-07T07:10:49.7798851Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:49.7802043Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fx_fusion.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:49.779931] 2025-09-07T07:10:54.4511524Z 2025-09-07T07:10:54.4512584Z inductor/test_fx_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fx_fusion_1.1_9bbbad95092278e4_.log 2025-09-07T07:10:54.4513794Z Running 0 items in this shard: 2025-09-07T07:10:54.4514092Z 2025-09-07T07:10:54.4516643Z Running inductor/test_move_constructors_to_cuda 1/1 ... [2025-09-07 07:10:54.451474] 2025-09-07T07:10:54.4517444Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:10:54.4520207Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_move_constructors_to_cuda.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:10:54.451805] 2025-09-07T07:11:01.2258119Z 2025-09-07T07:11:01.2259538Z inductor/test_move_constructors_to_cuda 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_move_constructors_to_cuda_1.1_6cba56396ad761df_.log 2025-09-07T07:11:01.2260934Z Running 0 items in this shard: 2025-09-07T07:11:01.2261242Z 2025-09-07T07:11:01.2261901Z Running dynamo/test_skip_non_tensor 1/1 ... [2025-09-07 07:11:01.226023] 2025-09-07T07:11:01.2262550Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:01.2266207Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_skip_non_tensor.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:01.226340] 2025-09-07T07:11:04.7959848Z 2025-09-07T07:11:04.7960979Z dynamo/test_skip_non_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_skip_non_tensor_1.1_0aace14818f505e8_.log 2025-09-07T07:11:04.7962791Z Running 0 items in this shard: 2025-09-07T07:11:04.7963144Z 2025-09-07T07:11:04.7966793Z Running export/test_tree_utils 1/1 ... [2025-09-07 07:11:04.796417] 2025-09-07T07:11:04.7967516Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:04.7970983Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tree_utils.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:04.796850] 2025-09-07T07:11:08.0659812Z 2025-09-07T07:11:08.0660894Z export/test_tree_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tree_utils_1.1_cd1a257d95483678_.log 2025-09-07T07:11:08.0662077Z Running 0 items in this shard: 2025-09-07T07:11:08.0662372Z 2025-09-07T07:11:08.0666775Z Running dynamo/test_frame_init 1/1 ... [2025-09-07 07:11:08.066461] 2025-09-07T07:11:08.0667566Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:08.0671313Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_frame_init.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:08.066960] 2025-09-07T07:11:11.3862563Z 2025-09-07T07:11:11.3864030Z dynamo/test_frame_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_frame_init_1.1_99a5963cdb182774_.log 2025-09-07T07:11:11.3865272Z Running 0 items in this shard: 2025-09-07T07:11:11.3865563Z 2025-09-07T07:11:11.3865901Z Running torch_np/test_dtype 1/1 ... [2025-09-07 07:11:11.386433] 2025-09-07T07:11:11.3866500Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:11.3871117Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_dtype.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:11.386732] 2025-09-07T07:11:14.7060528Z 2025-09-07T07:11:14.7061775Z torch_np/test_dtype 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_dtype_1.1_7cb2571ffe212b4e_.log 2025-09-07T07:11:14.7063100Z Running 0 items in this shard: 2025-09-07T07:11:14.7063444Z 2025-09-07T07:11:14.7067094Z Running inductor/test_indexing 1/1 ... [2025-09-07 07:11:14.706477] 2025-09-07T07:11:14.7067753Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:14.7070196Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_indexing.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:14.706779] 2025-09-07T07:11:21.5309012Z 2025-09-07T07:11:21.5310231Z inductor/test_indexing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_indexing_1.1_51dea3188804992d_.log 2025-09-07T07:11:21.5310951Z Running 0 items in this shard: 2025-09-07T07:11:21.5311121Z 2025-09-07T07:11:21.5314158Z Running inductor/test_minifier_utils 1/1 ... [2025-09-07 07:11:21.531223] 2025-09-07T07:11:21.5314574Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:21.5317372Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_minifier_utils.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:21.531534] 2025-09-07T07:11:25.0510917Z 2025-09-07T07:11:25.0511996Z inductor/test_minifier_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_minifier_utils_1.1_22b1d898ae69b04b_.log 2025-09-07T07:11:25.0513240Z Running 0 items in this shard: 2025-09-07T07:11:25.0513517Z 2025-09-07T07:11:25.0516488Z Running test_typing 1/1 ... [2025-09-07 07:11:25.051396] 2025-09-07T07:11:25.0517192Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:25.0518750Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_typing.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:25.051704] 2025-09-07T07:11:28.3709971Z 2025-09-07T07:11:28.3710915Z test_typing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_typing_1.1_f0904b206bec997e_.log 2025-09-07T07:11:28.3711966Z Running 0 items in this shard: 2025-09-07T07:11:28.3712247Z 2025-09-07T07:11:28.3715282Z Running functorch/test_aot_joint_with_descriptors 1/1 ... [2025-09-07 07:11:28.371302] 2025-09-07T07:11:28.3716000Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:28.3718602Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_aot_joint_with_descriptors.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:28.371599] 2025-09-07T07:11:31.8913265Z 2025-09-07T07:11:31.8914308Z functorch/test_aot_joint_with_descriptors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_aot_joint_with_descriptors_1.1_ea2dace91ece50e1_.log 2025-09-07T07:11:31.8915382Z Running 0 items in this shard: 2025-09-07T07:11:31.8915646Z 2025-09-07T07:11:31.8917665Z Running test_utils_filelock 1/1 ... [2025-09-07 07:11:31.891591] 2025-09-07T07:11:31.8918301Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:31.8922101Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_utils_filelock.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:31.891901] 2025-09-07T07:11:35.1109965Z 2025-09-07T07:11:35.1111050Z test_utils_filelock 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_utils_filelock_1.1_fa455430b10a0053_.log 2025-09-07T07:11:35.1112161Z Running 0 items in this shard: 2025-09-07T07:11:35.1112459Z 2025-09-07T07:11:35.1114811Z Running inductor/test_torchinductor 1/1 ... [2025-09-07 07:11:35.111312] 2025-09-07T07:11:35.1115506Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:35.1118300Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:35.111613] 2025-09-07T07:11:42.4365609Z 2025-09-07T07:11:42.4367173Z inductor/test_torchinductor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_1.1_6ab12b1e61f8b7f6_.log 2025-09-07T07:11:42.4369262Z Running 1 items in this shard: test/inductor/test_torchinductor.py::GPUTests::test_large_block_sizes_cuda 2025-09-07T07:11:42.4370005Z 2025-09-07T07:11:42.4371994Z Running inductor/test_metrics 1/1 ... [2025-09-07 07:11:42.436986] 2025-09-07T07:11:42.4372630Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:42.4375957Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_metrics.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:42.437309] 2025-09-07T07:11:49.2112647Z 2025-09-07T07:11:49.2113838Z inductor/test_metrics 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_metrics_1.1_7d3fed82d826455c_.log 2025-09-07T07:11:49.2115000Z Running 0 items in this shard: 2025-09-07T07:11:49.2115346Z 2025-09-07T07:11:49.2117524Z Running inductor/test_coordinate_descent_tuner 1/1 ... [2025-09-07 07:11:49.211489] 2025-09-07T07:11:49.2118527Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:49.2120663Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_coordinate_descent_tuner.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:49.211810] 2025-09-07T07:11:56.0358684Z 2025-09-07T07:11:56.0360017Z inductor/test_coordinate_descent_tuner 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_coordinate_descent_tuner_1.1_5d4551f2a4733d04_.log 2025-09-07T07:11:56.0361384Z Running 0 items in this shard: 2025-09-07T07:11:56.0361661Z 2025-09-07T07:11:56.0365481Z Running inductor/test_foreach 1/1 ... [2025-09-07 07:11:56.036287] 2025-09-07T07:11:56.0366206Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:11:56.0369267Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_foreach.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:11:56.036753] 2025-09-07T07:12:03.2615372Z 2025-09-07T07:12:03.2616600Z inductor/test_foreach 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_foreach_1.1_6f1a4168591e65dc_.log 2025-09-07T07:12:03.2617769Z Running 0 items in this shard: 2025-09-07T07:12:03.2618099Z 2025-09-07T07:12:03.2619185Z Running backends/xeon/test_launch 1/1 ... [2025-09-07 07:12:03.261729] 2025-09-07T07:12:03.2619838Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:12:03.2630988Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'backends/xeon/test_launch.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:12:03.262048] 2025-09-07T07:12:06.5312938Z 2025-09-07T07:12:06.5314462Z backends/xeon/test_launch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/backends.xeon.test_launch_1.1_a50d01abcab9b714_.log 2025-09-07T07:12:06.5315683Z Running 0 items in this shard: 2025-09-07T07:12:06.5315957Z 2025-09-07T07:12:06.5318681Z Running dynamo/test_functions 1/1 ... [2025-09-07 07:12:06.531692] 2025-09-07T07:12:06.5319313Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:12:06.5322384Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_functions.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:12:06.531992] 2025-09-07T07:12:13.9569229Z 2025-09-07T07:12:13.9570190Z dynamo/test_functions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_functions_1.1_46ca670323a42619_.log 2025-09-07T07:12:13.9571341Z Running 0 items in this shard: 2025-09-07T07:12:13.9571687Z 2025-09-07T07:12:13.9576481Z Running inductor/test_torchinductor_opinfo 1/12 ... [2025-09-07 07:12:13.957431] 2025-09-07T07:12:13.9577017Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:12:13.9580212Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'serial', '--shard-id=1', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:12:13.957825] 2025-09-07T07:12:23.1351220Z 2025-09-07T07:12:23.1352591Z inductor/test_torchinductor_opinfo 1/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_1.12_1a6db45a89df0035_.log 2025-09-07T07:12:23.1353903Z Running 0 items in this shard: 2025-09-07T07:12:23.1354241Z 2025-09-07T07:12:23.1360880Z Running inductor/test_torchinductor_opinfo 4/12 ... [2025-09-07 07:12:23.135529] 2025-09-07T07:12:23.1361508Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:12:23.1362522Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'serial', '--shard-id=4', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:12:23.136028] 2025-09-07T07:12:32.1631184Z 2025-09-07T07:12:32.1632526Z inductor/test_torchinductor_opinfo 4/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_4.12_0ca0af2bb943af0f_.log 2025-09-07T07:12:32.1633886Z Running 0 items in this shard: 2025-09-07T07:12:32.1634176Z 2025-09-07T07:12:32.1635324Z Running inductor/test_torchinductor_opinfo 5/12 ... [2025-09-07 07:12:32.163311] 2025-09-07T07:12:32.1636052Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:12:32.1638684Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'serial', '--shard-id=5', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:12:32.163675] 2025-09-07T07:12:41.2408260Z 2025-09-07T07:12:41.2409906Z inductor/test_torchinductor_opinfo 5/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_5.12_8e2589fd78173698_.log 2025-09-07T07:12:41.2411551Z Running 0 items in this shard: 2025-09-07T07:12:41.2411911Z 2025-09-07T07:12:41.2414518Z Running inductor/test_torchinductor_opinfo 8/12 ... [2025-09-07 07:12:41.241257] 2025-09-07T07:12:41.2415241Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:12:41.2419307Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'serial', '--shard-id=8', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:12:41.241602] 2025-09-07T07:12:50.3184948Z 2025-09-07T07:12:50.3185876Z inductor/test_torchinductor_opinfo 8/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_8.12_a6eac216e23a474c_.log 2025-09-07T07:12:50.3186673Z Running 0 items in this shard: 2025-09-07T07:12:50.3186842Z 2025-09-07T07:12:50.3193247Z Running inductor/test_torchinductor_opinfo 9/12 ... [2025-09-07 07:12:50.318935] 2025-09-07T07:12:50.3193766Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:12:50.3195160Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'serial', '--shard-id=9', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:12:50.319288] 2025-09-07T07:12:59.3963970Z 2025-09-07T07:12:59.3965534Z inductor/test_torchinductor_opinfo 9/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_9.12_1dc23e7e21744605_.log 2025-09-07T07:12:59.3966851Z Running 0 items in this shard: 2025-09-07T07:12:59.3967165Z 2025-09-07T07:12:59.3969963Z Running inductor/test_torchinductor_opinfo 12/12 ... [2025-09-07 07:12:59.396719] 2025-09-07T07:12:59.3970687Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:12:59.3972889Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'serial', '--shard-id=12', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:12:59.397045] 2025-09-07T07:13:08.4743182Z 2025-09-07T07:13:08.4744817Z inductor/test_torchinductor_opinfo 12/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_12.12_dbc60a08497f0d9b_.log 2025-09-07T07:13:08.4747003Z Running 0 items in this shard: 2025-09-07T07:13:08.4747398Z 2025-09-07T07:13:08.4749571Z Running dynamo/test_dicts 1/1 ... [2025-09-07 07:13:08.474760] 2025-09-07T07:13:08.4749941Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:08.4752754Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_dicts.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:08.475079] 2025-09-07T07:13:12.0948934Z 2025-09-07T07:13:12.0949687Z dynamo/test_dicts 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_dicts_1.1_88de2ac782c92a8a_.log 2025-09-07T07:13:12.0950490Z Running 0 items in this shard: 2025-09-07T07:13:12.0950724Z 2025-09-07T07:13:12.0954081Z Running dynamo/test_sdpa 1/1 ... [2025-09-07 07:13:12.095287] 2025-09-07T07:13:12.0954470Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:12.0957559Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_sdpa.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:12.095585] 2025-09-07T07:13:15.6651990Z 2025-09-07T07:13:15.6653212Z dynamo/test_sdpa 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_sdpa_1.1_d0e8b91c85582677_.log 2025-09-07T07:13:15.6654463Z Running 0 items in this shard: 2025-09-07T07:13:15.6654782Z 2025-09-07T07:13:15.6657774Z Running dynamo/test_list 1/1 ... [2025-09-07 07:13:15.665536] 2025-09-07T07:13:15.6658378Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:15.6661195Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_list.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:15.665856] 2025-09-07T07:13:19.2353907Z 2025-09-07T07:13:19.2354991Z dynamo/test_list 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_list_1.1_68b08753c05166a3_.log 2025-09-07T07:13:19.2356055Z Running 0 items in this shard: 2025-09-07T07:13:19.2356338Z 2025-09-07T07:13:19.2360797Z Running inductor/test_autoheuristic 1/1 ... [2025-09-07 07:13:19.235841] 2025-09-07T07:13:19.2361347Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:19.2364650Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_autoheuristic.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:19.236252] 2025-09-07T07:13:26.2106117Z 2025-09-07T07:13:26.2107060Z inductor/test_autoheuristic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_autoheuristic_1.1_92742f64af634e6a_.log 2025-09-07T07:13:26.2107962Z Running 0 items in this shard: 2025-09-07T07:13:26.2108178Z 2025-09-07T07:13:26.2111775Z Running test_flop_counter 1/1 ... [2025-09-07 07:13:26.211005] 2025-09-07T07:13:26.2112235Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:26.2115183Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_flop_counter.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:26.211321] 2025-09-07T07:13:29.9812021Z 2025-09-07T07:13:29.9813282Z test_flop_counter 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_flop_counter_1.1_4fbbf38f50037f53_.log 2025-09-07T07:13:29.9814724Z Running 0 items in this shard: 2025-09-07T07:13:29.9815009Z 2025-09-07T07:13:29.9818375Z Running dynamo/test_fx_graph_runnable 1/1 ... [2025-09-07 07:13:29.981585] 2025-09-07T07:13:29.9819574Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:29.9821304Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fx_graph_runnable.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:29.981901] 2025-09-07T07:13:36.9562513Z 2025-09-07T07:13:36.9563884Z dynamo/test_fx_graph_runnable 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fx_graph_runnable_1.1_f720c4a35cad77ae_.log 2025-09-07T07:13:36.9565382Z Running 0 items in this shard: 2025-09-07T07:13:36.9565716Z 2025-09-07T07:13:36.9569048Z Running inductor/test_ordered_set 1/1 ... [2025-09-07 07:13:36.956670] 2025-09-07T07:13:36.9569703Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:36.9572598Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_ordered_set.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:36.956996] 2025-09-07T07:13:40.3263609Z 2025-09-07T07:13:40.3264715Z inductor/test_ordered_set 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_ordered_set_1.1_7db158c88826e9b9_.log 2025-09-07T07:13:40.3265956Z Running 0 items in this shard: 2025-09-07T07:13:40.3266303Z 2025-09-07T07:13:40.3270514Z Running dynamo/test_recompiles 1/1 ... [2025-09-07 07:13:40.326823] 2025-09-07T07:13:40.3271038Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:40.3274337Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_recompiles.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:40.327191] 2025-09-07T07:13:43.8967325Z 2025-09-07T07:13:43.8968568Z dynamo/test_recompiles 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_recompiles_1.1_744a854c563500cf_.log 2025-09-07T07:13:43.8969700Z Running 0 items in this shard: 2025-09-07T07:13:43.8969978Z 2025-09-07T07:13:43.8971894Z Running test_per_overload_api 1/1 ... [2025-09-07 07:13:43.896972] 2025-09-07T07:13:43.8972592Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:43.8975174Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_per_overload_api.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:43.897263] 2025-09-07T07:13:47.1664514Z 2025-09-07T07:13:47.1665787Z test_per_overload_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_per_overload_api_1.1_32e756ab08e0c4c9_.log 2025-09-07T07:13:47.1667161Z Running 0 items in this shard: 2025-09-07T07:13:47.1667530Z 2025-09-07T07:13:47.1671008Z Running inductor/test_xpu_basic 1/1 ... [2025-09-07 07:13:47.166902] 2025-09-07T07:13:47.1671482Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:47.1674155Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_xpu_basic.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:47.167214] 2025-09-07T07:13:54.1415876Z 2025-09-07T07:13:54.1417496Z inductor/test_xpu_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_xpu_basic_1.1_ddd785dcb9ad66f2_.log 2025-09-07T07:13:54.1418696Z 2025-09-07T07:13:54.1422353Z Running export/test_cpp_serdes 1/1 ... [2025-09-07 07:13:54.142062] 2025-09-07T07:13:54.1422998Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:54.1426656Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_cpp_serdes.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:54.142409] 2025-09-07T07:14:01.2669551Z 2025-09-07T07:14:01.2670800Z export/test_cpp_serdes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_cpp_serdes_1.1_6f03687a984cdaf7_.log 2025-09-07T07:14:01.2671942Z Running 0 items in this shard: 2025-09-07T07:14:01.2672241Z 2025-09-07T07:14:01.2675324Z Running inductor/test_utils 1/1 ... [2025-09-07 07:14:01.267353] 2025-09-07T07:14:01.2675736Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:01.2678589Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_utils.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:01.267676] 2025-09-07T07:14:04.9876136Z 2025-09-07T07:14:04.9877255Z inductor/test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_utils_1.1_020729ef75038f61_.log 2025-09-07T07:14:04.9878401Z Running 0 items in this shard: 2025-09-07T07:14:04.9878697Z 2025-09-07T07:14:04.9880686Z Running inductor/test_cuda_repro 1/1 ... [2025-09-07 07:14:04.987888] 2025-09-07T07:14:04.9881349Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:04.9884022Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_repro.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:04.988192] 2025-09-07T07:14:12.1628308Z 2025-09-07T07:14:12.1630061Z inductor/test_cuda_repro 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cuda_repro_1.1_13447569ac5fe955_.log 2025-09-07T07:14:12.1633035Z Running 0 items in this shard: 2025-09-07T07:14:12.1633440Z 2025-09-07T07:14:12.1633716Z Running test_pytree 1/1 ... [2025-09-07 07:14:12.163039] 2025-09-07T07:14:12.1634290Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:12.1636593Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_pytree.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:12.163356] 2025-09-07T07:14:15.4826130Z 2025-09-07T07:14:15.4827008Z test_pytree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_pytree_1.1_534b1c779912b91e_.log 2025-09-07T07:14:15.4827994Z Running 0 items in this shard: 2025-09-07T07:14:15.4828272Z 2025-09-07T07:14:15.4830341Z Running inductor/test_fp8 1/1 ... [2025-09-07 07:14:15.482858] 2025-09-07T07:14:15.4830955Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:15.4833892Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fp8.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:15.483162] 2025-09-07T07:14:22.4073716Z 2025-09-07T07:14:22.4075066Z inductor/test_fp8 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fp8_1.1_7f2e488b618df83f_.log 2025-09-07T07:14:22.4076321Z Running 0 items in this shard: 2025-09-07T07:14:22.4077318Z 2025-09-07T07:14:22.4079452Z Running dynamo/test_nested_graph_breaks 1/1 ... [2025-09-07 07:14:22.407728] 2025-09-07T07:14:22.4080136Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:22.4083216Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_nested_graph_breaks.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:22.408048] 2025-09-07T07:14:25.9776214Z 2025-09-07T07:14:25.9777394Z dynamo/test_nested_graph_breaks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_nested_graph_breaks_1.1_ad34e9c453e3885f_.log 2025-09-07T07:14:25.9778616Z Running 0 items in this shard: 2025-09-07T07:14:25.9778936Z 2025-09-07T07:14:25.9783684Z Running dynamo/test_pre_dispatch 1/1 ... [2025-09-07 07:14:25.978104] 2025-09-07T07:14:25.9784267Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:25.9787588Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_pre_dispatch.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:25.978552] 2025-09-07T07:14:29.2978200Z 2025-09-07T07:14:29.2979452Z dynamo/test_pre_dispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_pre_dispatch_1.1_6bd1b209f5e4c858_.log 2025-09-07T07:14:29.2980636Z Running 0 items in this shard: 2025-09-07T07:14:29.2980919Z 2025-09-07T07:14:29.2984454Z Running dynamo/test_fx_passes_pre_grad 1/1 ... [2025-09-07 07:14:29.298292] 2025-09-07T07:14:29.2984842Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:29.2987654Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fx_passes_pre_grad.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:29.298595] 2025-09-07T07:14:32.6180808Z 2025-09-07T07:14:32.6181887Z dynamo/test_fx_passes_pre_grad 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fx_passes_pre_grad_1.1_edc233ac18676351_.log 2025-09-07T07:14:32.6183091Z Running 0 items in this shard: 2025-09-07T07:14:32.6184232Z 2025-09-07T07:14:32.6184886Z Running inductor/test_combo_kernels 1/1 ... [2025-09-07 07:14:32.618272] 2025-09-07T07:14:32.6185596Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:32.6187545Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_combo_kernels.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:32.618564] 2025-09-07T07:14:39.8431359Z 2025-09-07T07:14:39.8432601Z inductor/test_combo_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_combo_kernels_1.1_585c81aedab4569f_.log 2025-09-07T07:14:39.8433847Z Running 0 items in this shard: 2025-09-07T07:14:39.8434124Z 2025-09-07T07:14:39.8436422Z Running inductor/test_gpu_cpp_wrapper 1/1 ... [2025-09-07 07:14:39.843466] 2025-09-07T07:14:39.8437072Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:39.8440467Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_gpu_cpp_wrapper.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:39.843794] 2025-09-07T07:14:47.6693526Z 2025-09-07T07:14:47.6694927Z inductor/test_gpu_cpp_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_gpu_cpp_wrapper_1.1_89e9998583f8fae6_.log 2025-09-07T07:14:47.6696124Z Running 0 items in this shard: 2025-09-07T07:14:47.6697039Z 2025-09-07T07:14:47.6699326Z Running inductor/test_device_assert 1/1 ... [2025-09-07 07:14:47.669666] 2025-09-07T07:14:47.6699987Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:47.6702025Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_device_assert.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:47.669989] 2025-09-07T07:14:54.4942709Z 2025-09-07T07:14:54.4943918Z inductor/test_device_assert 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_device_assert_1.1_13835b4fd229c0af_.log 2025-09-07T07:14:54.4945120Z Running 0 items in this shard: 2025-09-07T07:14:54.4945403Z 2025-09-07T07:14:54.4951839Z Running inductor/test_op_completeness 1/1 ... [2025-09-07 07:14:54.494575] 2025-09-07T07:14:54.4952260Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:54.4953268Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_op_completeness.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:54.494903] 2025-09-07T07:14:58.1647756Z 2025-09-07T07:14:58.1649057Z inductor/test_op_completeness 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_op_completeness_1.1_c79d92d6d9c53c67_.log 2025-09-07T07:14:58.1652119Z Running 0 items in this shard: 2025-09-07T07:14:58.1652506Z 2025-09-07T07:14:58.1652800Z Running export/test_tools 1/1 ... [2025-09-07 07:14:58.165092] 2025-09-07T07:14:58.1653418Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:14:58.1656720Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tools.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:14:58.165417] 2025-09-07T07:15:01.7852448Z 2025-09-07T07:15:01.7853575Z export/test_tools 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tools_1.1_43f60b05b6473a51_.log 2025-09-07T07:15:01.7854830Z Running 0 items in this shard: 2025-09-07T07:15:01.7855108Z 2025-09-07T07:15:01.7857593Z Running dynamo/test_subgraphs 1/1 ... [2025-09-07 07:15:01.785480] 2025-09-07T07:15:01.7858228Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:15:01.7859687Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_subgraphs.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:15:01.785781] 2025-09-07T07:15:05.4054635Z 2025-09-07T07:15:05.4055695Z dynamo/test_subgraphs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_subgraphs_1.1_f51809bdeaeaffc4_.log 2025-09-07T07:15:05.4057033Z Running 0 items in this shard: 2025-09-07T07:15:05.4057356Z 2025-09-07T07:15:05.4061770Z Running dynamo/test_dynamic_shapes 1/1 ... [2025-09-07 07:15:05.405973] 2025-09-07T07:15:05.4062421Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:15:05.4065590Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_dynamic_shapes.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:15:05.406278] 2025-09-07T07:15:14.1829650Z 2025-09-07T07:15:14.1830701Z dynamo/test_dynamic_shapes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_dynamic_shapes_1.1_adb0dba44226942f_.log 2025-09-07T07:15:14.1833211Z Running 2 items in this shard: test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dont_dce_rand_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_mem_leak_guards_dynamic_shapes 2025-09-07T07:15:14.1835392Z 2025-09-07T07:15:14.1836518Z Running inductor/test_aot_inductor_utils 1/1 ... [2025-09-07 07:15:14.183420] 2025-09-07T07:15:14.1837194Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:15:14.1840936Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_utils.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:15:14.183809] 2025-09-07T07:15:21.0079383Z 2025-09-07T07:15:21.0080693Z inductor/test_aot_inductor_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_utils_1.1_f736d77801ac5a6d_.log 2025-09-07T07:15:21.0081993Z Running 0 items in this shard: 2025-09-07T07:15:21.0082286Z 2025-09-07T07:15:21.0084728Z Running functorch/test_ops 1/3 ... [2025-09-07 07:15:21.008282] 2025-09-07T07:15:21.0085386Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:15:21.0088321Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '-m', 'serial', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:15:21.008601] 2025-09-07T07:15:28.2330421Z 2025-09-07T07:15:28.2331732Z functorch/test_ops 1/3 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_1.3_5d133cf33fc7b71f_.log 2025-09-07T07:15:28.2332863Z Running 0 items in this shard: 2025-09-07T07:15:28.2333154Z 2025-09-07T07:15:28.2336179Z Running functorch/test_ops 2/3 ... [2025-09-07 07:15:28.233375] 2025-09-07T07:15:28.2336821Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:15:28.2339214Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '-m', 'serial', '--shard-id=2', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:15:28.233714] 2025-09-07T07:15:35.5584292Z 2025-09-07T07:15:35.5585401Z functorch/test_ops 2/3 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_2.3_463d1ffb9346d3bf_.log 2025-09-07T07:15:35.5588901Z Running 0 items in this shard: 2025-09-07T07:15:35.5589383Z 2025-09-07T07:15:35.5591360Z Running inductor/test_cpu_select_algorithm 1/1 ... [2025-09-07 07:15:35.558887] 2025-09-07T07:15:35.5592093Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:15:35.5595277Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpu_select_algorithm.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:15:35.559274] 2025-09-07T07:15:42.7839552Z 2025-09-07T07:15:42.7840937Z inductor/test_cpu_select_algorithm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_select_algorithm_1.1_9d08e81d1abb807f_.log 2025-09-07T07:15:42.7842261Z Running 0 items in this shard: 2025-09-07T07:15:42.7842550Z 2025-09-07T07:15:42.7846796Z Running xpu/test_gemm 1/1 ... [2025-09-07 07:15:42.784407] 2025-09-07T07:15:42.7847237Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:15:42.7850324Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'xpu/test_gemm.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:15:42.784863] 2025-09-07T07:15:46.6549798Z 2025-09-07T07:15:46.6550981Z xpu/test_gemm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/xpu.test_gemm_1.1_894453a0e1bd9216_.log 2025-09-07T07:15:46.6552007Z Running 0 items in this shard: 2025-09-07T07:15:46.6552283Z 2025-09-07T07:15:46.6554794Z Running higher_order_ops/test_invoke_quant 1/1 ... [2025-09-07 07:15:46.655260] 2025-09-07T07:15:46.6555508Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:15:46.6557883Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'higher_order_ops/test_invoke_quant.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:15:46.655568] 2025-09-07T07:15:53.2794481Z 2025-09-07T07:15:53.2795506Z higher_order_ops/test_invoke_quant 1/1 was successful, full logs can be found in artifacts with path test/test-reports/higher_order_ops.test_invoke_quant_1.1_cec3341ded701977_.log 2025-09-07T07:15:53.2796767Z Running 0 items in this shard: 2025-09-07T07:15:53.2797044Z 2025-09-07T07:15:53.2798668Z Running inductor/test_online_softmax 1/1 ... [2025-09-07 07:15:53.279645] 2025-09-07T07:15:53.2799324Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:15:53.2802126Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_online_softmax.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:15:53.279953] 2025-09-07T07:16:00.1038196Z 2025-09-07T07:16:00.1039075Z inductor/test_online_softmax 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_online_softmax_1.1_7d4646df5d2b050c_.log 2025-09-07T07:16:00.1039812Z Running 0 items in this shard: 2025-09-07T07:16:00.1039977Z 2025-09-07T07:16:00.1044251Z Running inductor/test_split_cat_fx_passes 1/1 ... [2025-09-07 07:16:00.104238] 2025-09-07T07:16:00.1044764Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:00.1047415Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_split_cat_fx_passes.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:00.104547] 2025-09-07T07:16:06.9787384Z 2025-09-07T07:16:06.9788604Z inductor/test_split_cat_fx_passes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_split_cat_fx_passes_1.1_a02ad8c81714a5cc_.log 2025-09-07T07:16:06.9790478Z Running 0 items in this shard: 2025-09-07T07:16:06.9790853Z 2025-09-07T07:16:06.9794674Z Running test_cuda_expandable_segments 1/1 ... [2025-09-07 07:16:06.979190] 2025-09-07T07:16:06.9795159Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:06.9798175Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_expandable_segments.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:06.979643] 2025-09-07T07:16:12.0015344Z 2025-09-07T07:16:12.0016795Z test_cuda_expandable_segments 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_expandable_segments_1.1_4eb30a489624bd99_.log 2025-09-07T07:16:12.0018081Z 2025-09-07T07:16:12.0021842Z Running test_type_hints 1/1 ... [2025-09-07 07:16:12.001952] 2025-09-07T07:16:12.0022275Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:12.0024882Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_hints.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:12.002273] 2025-09-07T07:16:15.3215141Z 2025-09-07T07:16:15.3216106Z test_type_hints 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_hints_1.1_c64988c872422d3c_.log 2025-09-07T07:16:15.3217178Z Running 0 items in this shard: 2025-09-07T07:16:15.3217462Z 2025-09-07T07:16:15.3222300Z Running dynamo/test_unittest 1/1 ... [2025-09-07 07:16:15.321969] 2025-09-07T07:16:15.3223204Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:15.3225811Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_unittest.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:15.322406] 2025-09-07T07:16:18.8921276Z 2025-09-07T07:16:18.8923117Z dynamo/test_unittest 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_unittest_1.1_caa53b59865b15c3_.log 2025-09-07T07:16:18.8924494Z Running 0 items in this shard: 2025-09-07T07:16:18.8924778Z 2025-09-07T07:16:18.8927760Z Running dynamo/test_guard_serialization 1/1 ... [2025-09-07 07:16:18.892532] 2025-09-07T07:16:18.8928564Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:18.8932072Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_guard_serialization.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:18.893030] 2025-09-07T07:16:25.7172524Z 2025-09-07T07:16:25.7173719Z dynamo/test_guard_serialization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_guard_serialization_1.1_cd740d24a82c16e0_.log 2025-09-07T07:16:25.7175092Z Running 0 items in this shard: 2025-09-07T07:16:25.7175446Z 2025-09-07T07:16:25.7180130Z Running functorch/test_minifier 1/1 ... [2025-09-07 07:16:25.717769] 2025-09-07T07:16:25.7180522Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:25.7183316Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_minifier.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:25.718155] 2025-09-07T07:16:29.1876953Z 2025-09-07T07:16:29.1878121Z functorch/test_minifier 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_minifier_1.1_c230a8c253e6f99b_.log 2025-09-07T07:16:29.1879267Z Running 0 items in this shard: 2025-09-07T07:16:29.1879546Z 2025-09-07T07:16:29.1886778Z Running test_legacy_vmap 1/1 ... [2025-09-07 07:16:29.188146] 2025-09-07T07:16:29.1887696Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:29.1888681Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_legacy_vmap.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:29.188602] 2025-09-07T07:16:33.1587488Z 2025-09-07T07:16:33.1588592Z test_legacy_vmap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_legacy_vmap_1.1_65224f3340069015_.log 2025-09-07T07:16:33.1589785Z Running 0 items in this shard: 2025-09-07T07:16:33.1590048Z 2025-09-07T07:16:33.1593515Z Running dynamo/test_cudagraphs_expandable_segments 1/1 ... [2025-09-07 07:16:33.159152] 2025-09-07T07:16:33.1594071Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:33.1596978Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_cudagraphs_expandable_segments.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:33.159452] 2025-09-07T07:16:36.9792789Z 2025-09-07T07:16:36.9794229Z dynamo/test_cudagraphs_expandable_segments 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_cudagraphs_expandable_segments_1.1_6c130073ade79146_.log 2025-09-07T07:16:36.9795632Z Running 0 items in this shard: 2025-09-07T07:16:36.9795920Z 2025-09-07T07:16:36.9800001Z Running torch_np/numpy_tests/core/test_einsum 1/1 ... [2025-09-07 07:16:36.979702] 2025-09-07T07:16:36.9801501Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:36.9804308Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_einsum.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:36.980226] 2025-09-07T07:16:40.2995198Z 2025-09-07T07:16:40.2996973Z torch_np/numpy_tests/core/test_einsum 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_einsum_1.1_0292e11157e37d77_.log 2025-09-07T07:16:40.2998536Z Running 0 items in this shard: 2025-09-07T07:16:40.2998830Z 2025-09-07T07:16:40.2999206Z Running inductor/test_benchmarking 1/1 ... [2025-09-07 07:16:40.299716] 2025-09-07T07:16:40.2999859Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:40.3002554Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmarking.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:40.300008] 2025-09-07T07:16:47.1742096Z 2025-09-07T07:16:47.1743565Z inductor/test_benchmarking 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmarking_1.1_67c763d2e1c43d64_.log 2025-09-07T07:16:47.1745035Z Running 0 items in this shard: 2025-09-07T07:16:47.1745308Z 2025-09-07T07:16:47.1748213Z Running dynamo/test_model_output 1/1 ... [2025-09-07 07:16:47.174618] 2025-09-07T07:16:47.1748698Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:47.1751518Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_model_output.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:47.174934] 2025-09-07T07:16:51.3953789Z 2025-09-07T07:16:51.3954925Z dynamo/test_model_output 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_model_output_1.1_db85ceb37a6e2001_.log 2025-09-07T07:16:51.3956108Z Running 0 items in this shard: 2025-09-07T07:16:51.3956400Z 2025-09-07T07:16:51.3957668Z Running torch_np/test_basic 1/1 ... [2025-09-07 07:16:51.395586] 2025-09-07T07:16:51.3958780Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:51.3961255Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_basic.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:51.395881] 2025-09-07T07:16:55.0657533Z 2025-09-07T07:16:55.0658608Z torch_np/test_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_basic_1.1_e8f1dc0107be1c89_.log 2025-09-07T07:16:55.0659766Z Running 0 items in this shard: 2025-09-07T07:16:55.0660095Z 2025-09-07T07:16:55.0667934Z Running test_segment_reductions 1/1 ... [2025-09-07 07:16:55.066204] 2025-09-07T07:16:55.0668602Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:55.0670190Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_segment_reductions.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:55.066651] 2025-09-07T07:16:58.9867401Z 2025-09-07T07:16:58.9868669Z test_segment_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_segment_reductions_1.1_2494b786d833333f_.log 2025-09-07T07:16:58.9869815Z Running 0 items in this shard: 2025-09-07T07:16:58.9870113Z 2025-09-07T07:16:58.9873262Z Running test_ops_fwd_gradients 1/1 ... [2025-09-07 07:16:58.987076] 2025-09-07T07:16:58.9873906Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:16:58.9876082Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_fwd_gradients.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:16:58.987384] 2025-09-07T07:17:04.5098803Z 2025-09-07T07:17:04.5100456Z test_ops_fwd_gradients 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_fwd_gradients_1.1_b88095eab50cbb56_.log 2025-09-07T07:17:04.5102047Z Running 0 items in this shard: 2025-09-07T07:17:04.5102383Z 2025-09-07T07:17:04.5105049Z Running inductor/test_compile 1/1 ... [2025-09-07 07:17:04.510347] 2025-09-07T07:17:04.5105515Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:17:04.5108820Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:17:04.510666] 2025-09-07T07:17:11.3347464Z 2025-09-07T07:17:11.3348622Z inductor/test_compile 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_1.1_1e71d2e8198620ec_.log 2025-09-07T07:17:11.3349794Z Running 0 items in this shard: 2025-09-07T07:17:11.3350082Z 2025-09-07T07:17:11.3351985Z Running test_pruning_op 1/1 ... [2025-09-07 07:17:11.335033] 2025-09-07T07:17:11.3352609Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:17:11.3356140Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_pruning_op.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:17:11.335365] 2025-09-07T07:17:14.7047259Z 2025-09-07T07:17:14.7048248Z test_pruning_op 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_pruning_op_1.1_69d6cc8a0d89ddfb_.log 2025-09-07T07:17:14.7049350Z Running 0 items in this shard: 2025-09-07T07:17:14.7049639Z 2025-09-07T07:17:14.7055213Z Running inductor/test_multi_kernel 1/1 ... [2025-09-07 07:17:14.705208] 2025-09-07T07:17:14.7056010Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:17:14.7059552Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_multi_kernel.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:17:14.705731] 2025-09-07T07:17:21.4797239Z 2025-09-07T07:17:21.4798461Z inductor/test_multi_kernel 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_multi_kernel_1.1_f81c9225779823c5_.log 2025-09-07T07:17:21.4799656Z Running 0 items in this shard: 2025-09-07T07:17:21.4799942Z 2025-09-07T07:17:21.4802380Z Running inductor/test_decompose_mem_bound_mm 1/1 ... [2025-09-07 07:17:21.480018] 2025-09-07T07:17:21.4802860Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:17:21.4805611Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_decompose_mem_bound_mm.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:17:21.480365] 2025-09-07T07:17:28.2041597Z 2025-09-07T07:17:28.2043024Z inductor/test_decompose_mem_bound_mm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_decompose_mem_bound_mm_1.1_fc951a4d5030d666_.log 2025-09-07T07:17:28.2044325Z Running 0 items in this shard: 2025-09-07T07:17:28.2044614Z 2025-09-07T07:17:28.2046994Z Running inductor/test_block_analysis 1/1 ... [2025-09-07 07:17:28.204458] 2025-09-07T07:17:28.2047650Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:17:28.2050045Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_block_analysis.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:17:28.204785] 2025-09-07T07:17:34.9787775Z 2025-09-07T07:17:34.9789124Z inductor/test_block_analysis 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_block_analysis_1.1_5d4db865ab7396d8_.log 2025-09-07T07:17:34.9790965Z Running 0 items in this shard: 2025-09-07T07:17:34.9791512Z 2025-09-07T07:17:34.9793506Z Running inductor/test_minifier_isolate 1/1 ... [2025-09-07 07:17:34.979112] 2025-09-07T07:17:34.9794206Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:17:34.9796985Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_minifier_isolate.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:17:34.979434] 2025-09-07T07:17:41.8536511Z 2025-09-07T07:17:41.8538022Z inductor/test_minifier_isolate 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_minifier_isolate_1.1_e25b68a9a5bf2757_.log 2025-09-07T07:17:41.8539535Z Running 0 items in this shard: 2025-09-07T07:17:41.8539866Z 2025-09-07T07:17:41.8542952Z Running export/test_swap 1/1 ... [2025-09-07 07:17:41.854072] 2025-09-07T07:17:41.8543568Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:17:41.8546697Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_swap.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:17:41.854405] 2025-09-07T07:17:45.1737304Z 2025-09-07T07:17:45.1738060Z export/test_swap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_swap_1.1_90e98cc48de2b7a5_.log 2025-09-07T07:17:45.1738867Z Running 0 items in this shard: 2025-09-07T07:17:45.1739089Z 2025-09-07T07:17:45.1742915Z Running functorch/test_dims 1/1 ... [2025-09-07 07:17:45.174120] 2025-09-07T07:17:45.1743532Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:17:45.1747102Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_dims.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:17:45.174421] 2025-09-07T07:17:48.6437330Z 2025-09-07T07:17:48.6438426Z functorch/test_dims 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_dims_1.1_ff14b66ec683e4eb_.log 2025-09-07T07:17:48.6439521Z Running 0 items in this shard: 2025-09-07T07:17:48.6439810Z 2025-09-07T07:17:48.6441042Z Running profiler/test_profiler 1/1 ... [2025-09-07 07:17:48.643905] 2025-09-07T07:17:48.6441701Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:17:48.6444762Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_profiler.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:17:48.644208] 2025-09-07T07:17:52.4143993Z 2025-09-07T07:17:52.4145166Z profiler/test_profiler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_profiler_1.1_a89e5290e63656d9_.log 2025-09-07T07:17:52.4153657Z Running 10 items in this shard: test/profiler/test_profiler.py::TestProfiler::test_source_multithreaded_basic_work_in_main_thread_False, test/profiler/test_profiler.py::TestProfiler::test_source_multithreaded_basic_work_in_main_thread_True, test/profiler/test_profiler.py::TestProfiler::test_source_multithreaded_close_in_scope_work_in_main_thread_False, test/profiler/test_profiler.py::TestProfiler::test_source_multithreaded_close_in_scope_work_in_main_thread_True, test/profiler/test_profiler.py::TestProfiler::test_source_multithreaded_complex_work_in_main_thread_False, test/profiler/test_profiler.py::TestProfiler::test_source_multithreaded_complex_work_in_main_thread_True, test/profiler/test_profiler.py::TestProfiler::test_source_multithreaded_multiple_preexisting_work_in_main_thread_False, test/profiler/test_profiler.py::TestProfiler::test_source_multithreaded_multiple_preexisting_work_in_main_thread_True, test/profiler/test_profiler.py::TestProfiler::test_source_multithreaded_open_in_scope_work_in_main_thread_False, test/profiler/test_profiler.py::TestProfiler::test_source_multithreaded_open_in_scope_work_in_main_thread_True 2025-09-07T07:17:52.4159065Z 2025-09-07T07:17:52.4159287Z Running inductor/test_op_dtype_prop 1/1 ... [2025-09-07 07:17:52.414865] 2025-09-07T07:17:52.4159674Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:17:52.4160608Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_op_dtype_prop.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:17:52.415218] 2025-09-07T07:18:00.2404082Z 2025-09-07T07:18:00.2405598Z inductor/test_op_dtype_prop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_op_dtype_prop_1.1_8ed5c50a50b2e41c_.log 2025-09-07T07:18:00.2407071Z Running 0 items in this shard: 2025-09-07T07:18:00.2407419Z 2025-09-07T07:18:00.2410667Z Running test_tensorexpr_pybind 1/1 ... [2025-09-07 07:18:00.240817] 2025-09-07T07:18:00.2411313Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:18:00.2414241Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_tensorexpr_pybind.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:18:00.241139] 2025-09-07T07:18:03.6604530Z 2025-09-07T07:18:03.6605437Z test_tensorexpr_pybind 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_tensorexpr_pybind_1.1_4b1361393226bec7_.log 2025-09-07T07:18:03.6606538Z Running 0 items in this shard: 2025-09-07T07:18:03.6606815Z 2025-09-07T07:18:03.6611679Z Running inductor/test_split_cat_fx_aten_passes 1/1 ... [2025-09-07 07:18:03.660940] 2025-09-07T07:18:03.6612868Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:18:03.6615795Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_split_cat_fx_aten_passes.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:18:03.661400] 2025-09-07T07:18:10.5356104Z 2025-09-07T07:18:10.5357535Z inductor/test_split_cat_fx_aten_passes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_split_cat_fx_aten_passes_1.1_08bfd2c68469a19a_.log 2025-09-07T07:18:10.5359017Z Running 0 items in this shard: 2025-09-07T07:18:10.5359363Z 2025-09-07T07:18:10.5362983Z Running dynamo/test_misc 1/1 ... [2025-09-07 07:18:10.536044] 2025-09-07T07:18:10.5363451Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:18:10.5366814Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_misc.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:18:10.536477] 2025-09-07T07:18:15.8583775Z 2025-09-07T07:18:15.8584524Z dynamo/test_misc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_misc_1.1_2269a1446552f4d2_.log 2025-09-07T07:18:15.8585160Z Running 0 items in this shard: 2025-09-07T07:18:15.8585323Z 2025-09-07T07:18:15.8588450Z Running inductor/test_loop_ordering 1/1 ... [2025-09-07 07:18:15.858636] 2025-09-07T07:18:15.8589306Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:18:15.8591555Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_loop_ordering.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:18:15.858948] 2025-09-07T07:18:22.7332214Z 2025-09-07T07:18:22.7334117Z inductor/test_loop_ordering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_loop_ordering_1.1_c2b9fa6d0ff8ee04_.log 2025-09-07T07:18:22.7335198Z Running 0 items in this shard: 2025-09-07T07:18:22.7335412Z 2025-09-07T07:18:22.7338235Z Running inductor/test_torchinductor_dynamic_shapes 1/2 ... [2025-09-07 07:18:22.733593] 2025-09-07T07:18:22.7338780Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:18:22.7341734Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_dynamic_shapes.py', '-m', 'serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:18:22.733923] 2025-09-07T07:18:30.5092850Z 2025-09-07T07:18:30.5094531Z inductor/test_torchinductor_dynamic_shapes 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_dynamic_shapes_1.2_5f0aaf692490979a_.log 2025-09-07T07:18:30.5097294Z Running 2 items in this shard: test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_block_sizes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_block_sizes_dynamic_shapes_cuda 2025-09-07T07:18:30.5098298Z 2025-09-07T07:18:30.5101512Z Running inductor/test_cutlass_evt 1/1 ... [2025-09-07 07:18:30.509486] 2025-09-07T07:18:30.5101894Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:18:30.5102814Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutlass_evt.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:18:30.509800] 2025-09-07T07:18:37.2838912Z 2025-09-07T07:18:37.2840694Z inductor/test_cutlass_evt 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutlass_evt_1.1_86ffca8c0866f854_.log 2025-09-07T07:18:37.2841909Z Running 0 items in this shard: 2025-09-07T07:18:37.2842208Z 2025-09-07T07:18:37.2843368Z Running dynamo/test_sets 1/1 ... [2025-09-07 07:18:37.284127] 2025-09-07T07:18:37.2843983Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:18:37.2846720Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_sets.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:18:37.284448] 2025-09-07T07:18:40.8541902Z 2025-09-07T07:18:40.8543220Z dynamo/test_sets 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_sets_1.1_19ea19471a1bb15f_.log 2025-09-07T07:18:40.8544843Z Running 0 items in this shard: 2025-09-07T07:18:40.8545268Z 2025-09-07T07:18:40.8548681Z Running test_numpy_interop 1/1 ... [2025-09-07 07:18:40.854721] 2025-09-07T07:18:40.8549178Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:18:40.8552414Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_numpy_interop.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:18:40.855030] 2025-09-07T07:18:44.8253276Z 2025-09-07T07:18:44.8254370Z test_numpy_interop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_numpy_interop_1.1_f7f529f21c70c849_.log 2025-09-07T07:18:44.8255455Z Running 0 items in this shard: 2025-09-07T07:18:44.8256185Z 2025-09-07T07:18:44.8260632Z Running inductor/test_cudagraph_trees_expandable_segments 1/1 ... [2025-09-07 07:18:44.825777] 2025-09-07T07:18:44.8261432Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:18:44.8264560Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cudagraph_trees_expandable_segments.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:18:44.826253] 2025-09-07T07:18:51.6504154Z 2025-09-07T07:18:51.6505697Z inductor/test_cudagraph_trees_expandable_segments 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cudagraph_trees_expandable_segments_1.1_504c04bfc737951e_.log 2025-09-07T07:18:51.6507348Z Running 0 items in this shard: 2025-09-07T07:18:51.6507683Z 2025-09-07T07:18:51.6511285Z Running dynamo/test_backward_higher_order_ops 1/1 ... [2025-09-07 07:18:51.650867] 2025-09-07T07:18:51.6511844Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:18:51.6514631Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_backward_higher_order_ops.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:18:51.651262] 2025-09-07T07:18:55.2207438Z 2025-09-07T07:18:55.2208836Z dynamo/test_backward_higher_order_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_backward_higher_order_ops_1.1_1addaaa4b47cdb73_.log 2025-09-07T07:18:55.2210193Z Running 0 items in this shard: 2025-09-07T07:18:55.2210475Z 2025-09-07T07:18:55.2217174Z Running inductor/test_torchinductor_codegen_config_overrides 1/1 ... [2025-09-07 07:18:55.221091] 2025-09-07T07:18:55.2217728Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:18:55.2218879Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_codegen_config_overrides.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:18:55.221407] 2025-09-07T07:19:02.0454785Z 2025-09-07T07:19:02.0464626Z inductor/test_torchinductor_codegen_config_overrides 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_codegen_config_overrides_1.1_2268226bc1512063_.log 2025-09-07T07:19:02.0465565Z Running 0 items in this shard: 2025-09-07T07:19:02.0465741Z 2025-09-07T07:19:02.0465914Z Running test_nestedtensor 1/1 ... [2025-09-07 07:19:02.045845] 2025-09-07T07:19:02.0466256Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:02.0467156Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:02.046167] 2025-09-07T07:19:07.5684550Z 2025-09-07T07:19:07.5685657Z test_nestedtensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_1.1_b791067a80ce40ae_.log 2025-09-07T07:19:07.5687359Z Running 1 items in this shard: test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_backward_memory_usage_cuda_float32 2025-09-07T07:19:07.5688384Z 2025-09-07T07:19:07.5691599Z Running dynamo/test_export_mutations 1/1 ... [2025-09-07 07:19:07.568884] 2025-09-07T07:19:07.5692061Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:07.5695038Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_export_mutations.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:07.569324] 2025-09-07T07:19:11.1391646Z 2025-09-07T07:19:11.1392852Z dynamo/test_export_mutations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_export_mutations_1.1_563b6cc2d424e2cf_.log 2025-09-07T07:19:11.1394087Z Running 0 items in this shard: 2025-09-07T07:19:11.1394373Z 2025-09-07T07:19:11.1398709Z Running inductor/test_scatter_optimization 1/1 ... [2025-09-07 07:19:11.139657] 2025-09-07T07:19:11.1399403Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:11.1402837Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_scatter_optimization.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:11.140091] 2025-09-07T07:19:17.9139054Z 2025-09-07T07:19:17.9140378Z inductor/test_scatter_optimization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_scatter_optimization_1.1_fafe078ab5cec230_.log 2025-09-07T07:19:17.9141722Z Running 0 items in this shard: 2025-09-07T07:19:17.9142009Z 2025-09-07T07:19:17.9145804Z Running test_ops_jit 1/1 ... [2025-09-07 07:19:17.914306] 2025-09-07T07:19:17.9146478Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:17.9149895Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_jit.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:17.914815] 2025-09-07T07:19:23.0364188Z 2025-09-07T07:19:23.0365215Z test_ops_jit 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_jit_1.1_c2e29418cbd47370_.log 2025-09-07T07:19:23.0366256Z Running 0 items in this shard: 2025-09-07T07:19:23.0366551Z 2025-09-07T07:19:23.0369231Z Running torch_np/numpy_tests/core/test_multiarray 1/2 ... [2025-09-07 07:19:23.036737] 2025-09-07T07:19:23.0370004Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:23.0372581Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_multiarray.py', '-m', 'serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:23.037037] 2025-09-07T07:19:26.6566555Z 2025-09-07T07:19:26.6567918Z torch_np/numpy_tests/core/test_multiarray 1/2 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_multiarray_1.2_43f62229de9006dc_.log 2025-09-07T07:19:26.6569413Z Running 0 items in this shard: 2025-09-07T07:19:26.6569764Z 2025-09-07T07:19:26.6573060Z Running torch_np/numpy_tests/core/test_multiarray 2/2 ... [2025-09-07 07:19:26.657030] 2025-09-07T07:19:26.6573654Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:26.6576824Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_multiarray.py', '-m', 'serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:26.657471] 2025-09-07T07:19:30.2772002Z 2025-09-07T07:19:30.2773405Z torch_np/numpy_tests/core/test_multiarray 2/2 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_multiarray_2.2_66fc58f5cbfe54dc_.log 2025-09-07T07:19:30.2775026Z Running 0 items in this shard: 2025-09-07T07:19:30.2775315Z 2025-09-07T07:19:30.2777175Z Running functorch/test_ac 1/1 ... [2025-09-07 07:19:30.277507] 2025-09-07T07:19:30.2777793Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:30.2780206Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ac.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:30.277815] 2025-09-07T07:19:36.8013590Z 2025-09-07T07:19:36.8014990Z functorch/test_ac 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ac_1.1_a528e157cd2ffeb8_.log 2025-09-07T07:19:36.8015927Z 2025-09-07T07:19:36.8019235Z Running dynamo/test_higher_order_ops 1/1 ... [2025-09-07 07:19:36.801673] 2025-09-07T07:19:36.8019950Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:36.8022494Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_higher_order_ops.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:36.801982] 2025-09-07T07:19:44.7276630Z 2025-09-07T07:19:44.7277769Z dynamo/test_higher_order_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_higher_order_ops_1.1_124fde2e9435d308_.log 2025-09-07T07:19:44.7278966Z Running 0 items in this shard: 2025-09-07T07:19:44.7279293Z 2025-09-07T07:19:44.7280501Z Running dynamo/test_comptime 1/1 ... [2025-09-07 07:19:44.727867] 2025-09-07T07:19:44.7281100Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:44.7283960Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_comptime.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:44.728192] 2025-09-07T07:19:48.2473866Z 2025-09-07T07:19:48.2474955Z dynamo/test_comptime 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_comptime_1.1_6c9737eba3623c89_.log 2025-09-07T07:19:48.2476069Z Running 0 items in this shard: 2025-09-07T07:19:48.2476353Z 2025-09-07T07:19:48.2478926Z Running test_datapipe 1/1 ... [2025-09-07 07:19:48.247723] 2025-09-07T07:19:48.2479268Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:48.2482132Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_datapipe.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:48.248036] 2025-09-07T07:19:51.6678301Z 2025-09-07T07:19:51.6679870Z test_datapipe 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_datapipe_1.1_75aa39dfedbc9c74_.log 2025-09-07T07:19:51.6681991Z Running 0 items in this shard: 2025-09-07T07:19:51.6682372Z 2025-09-07T07:19:51.6682951Z Running dynamo/test_logging 1/1 ... [2025-09-07 07:19:51.668020] 2025-09-07T07:19:51.6683425Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:51.6685549Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_logging.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:51.668343] 2025-09-07T07:19:58.5926093Z 2025-09-07T07:19:58.5927445Z dynamo/test_logging 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_logging_1.1_e1f981bd8a54e319_.log 2025-09-07T07:19:58.5928579Z Running 0 items in this shard: 2025-09-07T07:19:58.5928858Z 2025-09-07T07:19:58.5931036Z Running dynamo/test_debug_utils 1/1 ... [2025-09-07 07:19:58.592869] 2025-09-07T07:19:58.5931688Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:58.5934522Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_debug_utils.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:58.593189] 2025-09-07T07:20:02.8138097Z 2025-09-07T07:20:02.8139199Z dynamo/test_debug_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_debug_utils_1.1_7f6c21a000b54e58_.log 2025-09-07T07:20:02.8140888Z Running 0 items in this shard: 2025-09-07T07:20:02.8141173Z 2025-09-07T07:20:02.8141918Z Running test_out_dtype_op 1/1 ... [2025-09-07 07:20:02.814036] 2025-09-07T07:20:02.8142494Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:20:02.8145659Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_out_dtype_op.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:20:02.814341] 2025-09-07T07:20:06.7840897Z 2025-09-07T07:20:06.7841658Z test_out_dtype_op 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_out_dtype_op_1.1_a9e57e5ae06c07c5_.log 2025-09-07T07:20:06.7842486Z Running 0 items in this shard: 2025-09-07T07:20:06.7842710Z 2025-09-07T07:20:06.7844922Z Running functorch/test_eager_transforms 1/1 ... [2025-09-07 07:20:06.784299] 2025-09-07T07:20:06.7845452Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:20:06.7848275Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_eager_transforms.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:20:06.784598] 2025-09-07T07:20:12.1062942Z 2025-09-07T07:20:12.1064203Z functorch/test_eager_transforms 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_eager_transforms_1.1_9d296ae820aee5e1_.log 2025-09-07T07:20:12.1065439Z Running 0 items in this shard: 2025-09-07T07:20:12.1065709Z 2025-09-07T07:20:12.1066458Z Running export/test_hop 1/1 ... [2025-09-07 07:20:12.106499] 2025-09-07T07:20:12.1067048Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:20:12.1070292Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_hop.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:20:12.106804] 2025-09-07T07:20:16.9280802Z 2025-09-07T07:20:16.9281662Z export/test_hop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_hop_1.1_3c3665d6fec2906a_.log 2025-09-07T07:20:16.9282436Z Running 0 items in this shard: 2025-09-07T07:20:16.9283146Z 2025-09-07T07:20:16.9285671Z Running profiler/test_cpp_thread 1/1 ... [2025-09-07 07:20:16.928334] 2025-09-07T07:20:16.9286195Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:20:16.9288616Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_cpp_thread.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:20:16.928640] 2025-09-07T07:20:36.9698413Z 2025-09-07T07:20:36.9699675Z profiler/test_cpp_thread 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_cpp_thread_1.1_b15c75fca65b7390_.log 2025-09-07T07:20:36.9701146Z Running 0 items in this shard: 2025-09-07T07:20:36.9701477Z 2025-09-07T07:20:36.9705291Z Running dynamo/test_aot_autograd_cache 1/1 ... [2025-09-07 07:20:36.970299] 2025-09-07T07:20:36.9705848Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:20:36.9708680Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_aot_autograd_cache.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:20:36.970636] 2025-09-07T07:20:43.8447462Z 2025-09-07T07:20:43.8448501Z dynamo/test_aot_autograd_cache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_aot_autograd_cache_1.1_cc608ac0f3036dc4_.log 2025-09-07T07:20:43.8449853Z Running 0 items in this shard: 2025-09-07T07:20:43.8450924Z 2025-09-07T07:20:43.8454731Z Running inductor/test_auto_functionalize 1/1 ... [2025-09-07 07:20:43.845244] 2025-09-07T07:20:43.8455256Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:20:43.8458582Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_auto_functionalize.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:20:43.845627] 2025-09-07T07:20:47.4653463Z 2025-09-07T07:20:47.4654930Z inductor/test_auto_functionalize 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_auto_functionalize_1.1_0af9ab5b3f42b387_.log 2025-09-07T07:20:47.4656229Z Running 0 items in this shard: 2025-09-07T07:20:47.4656514Z 2025-09-07T07:20:47.4661324Z Running torch_np/test_function_base 1/1 ... [2025-09-07 07:20:47.465842] 2025-09-07T07:20:47.4662056Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:20:47.4665011Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_function_base.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:20:47.466316] 2025-09-07T07:20:50.7355192Z 2025-09-07T07:20:50.7356391Z torch_np/test_function_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_function_base_1.1_efec9d06f27e4be7_.log 2025-09-07T07:20:50.7357583Z Running 0 items in this shard: 2025-09-07T07:20:50.7357884Z 2025-09-07T07:20:50.7359225Z Running dynamo/test_activation_checkpointing 1/1 ... [2025-09-07 07:20:50.735713] 2025-09-07T07:20:50.7359958Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:20:50.7362298Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_activation_checkpointing.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:20:50.736020] 2025-09-07T07:20:57.9603181Z 2025-09-07T07:20:57.9604519Z dynamo/test_activation_checkpointing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_activation_checkpointing_1.1_4760e1f95da347ad_.log 2025-09-07T07:20:57.9606477Z Running 0 items in this shard: 2025-09-07T07:20:57.9606834Z 2025-09-07T07:20:57.9613456Z Running cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic 1/1 ... [2025-09-07 07:20:57.960746] 2025-09-07T07:20:57.9614101Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:20:57.9615241Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:20:57.961211] 2025-09-07T07:21:04.8355083Z 2025-09-07T07:21:04.8356757Z cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp_extensions.libtorch_agnostic_extension.test.test_libtorch_agnostic_1.1_c2078f438e9a53a0_.log 2025-09-07T07:21:04.8358799Z Running 0 items in this shard: 2025-09-07T07:21:04.8359140Z 2025-09-07T07:21:04.8362032Z Running dynamo/test_aot_autograd 1/1 ... [2025-09-07 07:21:04.836011] 2025-09-07T07:21:04.8362503Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:21:04.8365400Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_aot_autograd.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:21:04.836346] 2025-09-07T07:21:08.4059847Z 2025-09-07T07:21:08.4060923Z dynamo/test_aot_autograd 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_aot_autograd_1.1_99fbf4818534f5cb_.log 2025-09-07T07:21:08.4062768Z Running 0 items in this shard: 2025-09-07T07:21:08.4063073Z 2025-09-07T07:21:08.4066595Z Running dynamo/test_graph_deduplication 1/1 ... [2025-09-07 07:21:08.406403] 2025-09-07T07:21:08.4067413Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:21:08.4071111Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_graph_deduplication.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:21:08.406895] 2025-09-07T07:21:11.9766592Z 2025-09-07T07:21:11.9767840Z dynamo/test_graph_deduplication 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_graph_deduplication_1.1_1f6d3a61a92d17ae_.log 2025-09-07T07:21:11.9769117Z Running 0 items in this shard: 2025-09-07T07:21:11.9769407Z 2025-09-07T07:21:11.9770727Z Running test_model_exports_to_core_aten 1/1 ... [2025-09-07 07:21:11.976903] 2025-09-07T07:21:11.9771410Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:21:11.9774369Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_model_exports_to_core_aten.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:21:11.977208] 2025-09-07T07:21:15.6970544Z 2025-09-07T07:21:15.6971696Z test_model_exports_to_core_aten 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_model_exports_to_core_aten_1.1_e52c7eef0da75c1c_.log 2025-09-07T07:21:15.6973099Z Running 0 items in this shard: 2025-09-07T07:21:15.6973432Z 2025-09-07T07:21:15.6977450Z Running test_itt 1/1 ... [2025-09-07 07:21:15.697542] 2025-09-07T07:21:15.6978052Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:21:15.6981079Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_itt.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:21:15.697870] 2025-09-07T07:21:19.0170881Z 2025-09-07T07:21:19.0172523Z test_itt 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_itt_1.1_16469e4ef25ba366_.log 2025-09-07T07:21:19.0173555Z Running 0 items in this shard: 2025-09-07T07:21:19.0174045Z 2025-09-07T07:21:19.0178271Z Running test_modules 1/3 ... [2025-09-07 07:21:19.017589] 2025-09-07T07:21:19.0178845Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:21:19.0182007Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_modules.py', '-m', 'serial', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:21:19.018035] 2025-09-07T07:21:24.6404543Z 2025-09-07T07:21:24.6405626Z test_modules 1/3 was successful, full logs can be found in artifacts with path test/test-reports/test_modules_1.3_47a098dd72d816db_.log 2025-09-07T07:21:24.6406663Z Running 0 items in this shard: 2025-09-07T07:21:24.6406970Z 2025-09-07T07:21:24.6409586Z Running test_modules 3/3 ... [2025-09-07 07:21:24.640785] 2025-09-07T07:21:24.6410200Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:21:24.6412851Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_modules.py', '-m', 'serial', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:21:24.641089] 2025-09-07T07:21:30.2634025Z 2025-09-07T07:21:30.2635115Z test_modules 3/3 was successful, full logs can be found in artifacts with path test/test-reports/test_modules_3.3_4ef7b3f11f97e82f_.log 2025-09-07T07:21:30.2636156Z Running 0 items in this shard: 2025-09-07T07:21:30.2636449Z 2025-09-07T07:21:30.2640220Z Running inductor/test_mps_basic 1/1 ... [2025-09-07 07:21:30.263844] 2025-09-07T07:21:30.2640611Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:21:30.2643378Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_mps_basic.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:21:30.264161] 2025-09-07T07:21:37.3886233Z 2025-09-07T07:21:37.3887622Z inductor/test_mps_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_mps_basic_1.1_e07c5debc8aee355_.log 2025-09-07T07:21:37.3888650Z 2025-09-07T07:21:37.3890316Z Running test_decomp 2/22 ... [2025-09-07 07:21:37.388857] 2025-09-07T07:21:37.3890921Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:21:37.3893994Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=2', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:21:37.389177] 2025-09-07T07:21:44.2634248Z 2025-09-07T07:21:44.2635272Z test_decomp 2/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_2.22_d00f793cae9bcff9_.log 2025-09-07T07:21:44.2640166Z Running 0 items in this shard: 2025-09-07T07:21:44.2640601Z 2025-09-07T07:21:44.2640894Z Running test_decomp 3/22 ... [2025-09-07 07:21:44.263802] 2025-09-07T07:21:44.2641530Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:21:44.2643637Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=3', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:21:44.264114] 2025-09-07T07:21:51.1384021Z 2025-09-07T07:21:51.1385438Z test_decomp 3/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_3.22_ce1da52fb03ece11_.log 2025-09-07T07:21:51.1386717Z Running 0 items in this shard: 2025-09-07T07:21:51.1387046Z 2025-09-07T07:21:51.1390044Z Running test_decomp 6/22 ... [2025-09-07 07:21:51.138818] 2025-09-07T07:21:51.1390618Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:21:51.1394223Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=6', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:21:51.139140] 2025-09-07T07:21:57.9630330Z 2025-09-07T07:21:57.9631578Z test_decomp 6/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_6.22_ffa0bacfe9a279cb_.log 2025-09-07T07:21:57.9632632Z Running 0 items in this shard: 2025-09-07T07:21:57.9632921Z 2025-09-07T07:21:57.9634987Z Running test_decomp 7/22 ... [2025-09-07 07:21:57.963334] 2025-09-07T07:21:57.9635595Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:21:57.9638713Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=7', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:21:57.963647] 2025-09-07T07:22:04.8377930Z 2025-09-07T07:22:04.8378948Z test_decomp 7/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_7.22_3d8f022909e99bcb_.log 2025-09-07T07:22:04.8379985Z Running 0 items in this shard: 2025-09-07T07:22:04.8380273Z 2025-09-07T07:22:04.8384547Z Running test_decomp 10/22 ... [2025-09-07 07:22:04.838235] 2025-09-07T07:22:04.8385205Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:22:04.8388940Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=10', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:22:04.838728] 2025-09-07T07:22:11.7127770Z 2025-09-07T07:22:11.7128948Z test_decomp 10/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_10.22_777e06aa1f23c13a_.log 2025-09-07T07:22:11.7129960Z Running 0 items in this shard: 2025-09-07T07:22:11.7130239Z 2025-09-07T07:22:11.7131872Z Running test_decomp 11/22 ... [2025-09-07 07:22:11.713008] 2025-09-07T07:22:11.7133390Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:22:11.7135780Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=11', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:22:11.713325] 2025-09-07T07:22:18.5872458Z 2025-09-07T07:22:18.5873689Z test_decomp 11/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_11.22_af45a9e0ac7f2793_.log 2025-09-07T07:22:18.5874769Z Running 0 items in this shard: 2025-09-07T07:22:18.5875045Z 2025-09-07T07:22:18.5876502Z Running test_decomp 14/22 ... [2025-09-07 07:22:18.587465] 2025-09-07T07:22:18.5877055Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:22:18.5880318Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=14', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:22:18.587781] 2025-09-07T07:22:25.4620323Z 2025-09-07T07:22:25.4621294Z test_decomp 14/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_14.22_ad69e0d3286d577c_.log 2025-09-07T07:22:25.4622367Z Running 0 items in this shard: 2025-09-07T07:22:25.4622710Z 2025-09-07T07:22:25.4626781Z Running test_decomp 15/22 ... [2025-09-07 07:22:25.462466] 2025-09-07T07:22:25.4627354Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:22:25.4630834Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=15', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:22:25.462844] 2025-09-07T07:22:32.3367867Z 2025-09-07T07:22:32.3369147Z test_decomp 15/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_15.22_2805539b3af17022_.log 2025-09-07T07:22:32.3369944Z Running 0 items in this shard: 2025-09-07T07:22:32.3370166Z 2025-09-07T07:22:32.3373944Z Running test_decomp 18/22 ... [2025-09-07 07:22:32.337224] 2025-09-07T07:22:32.3374297Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:22:32.3377596Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=18', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:22:32.337586] 2025-09-07T07:22:39.1615995Z 2025-09-07T07:22:39.1617045Z test_decomp 18/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_18.22_e10056c8412ea633_.log 2025-09-07T07:22:39.1618062Z Running 0 items in this shard: 2025-09-07T07:22:39.1618357Z 2025-09-07T07:22:39.1620895Z Running test_decomp 19/22 ... [2025-09-07 07:22:39.161844] 2025-09-07T07:22:39.1621474Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:22:39.1624301Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=19', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:22:39.162168] 2025-09-07T07:22:46.0363373Z 2025-09-07T07:22:46.0364297Z test_decomp 19/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_19.22_6211d29d5a48f5ea_.log 2025-09-07T07:22:46.0365850Z Running 0 items in this shard: 2025-09-07T07:22:46.0366137Z 2025-09-07T07:22:46.0367330Z Running test_decomp 22/22 ... [2025-09-07 07:22:46.036540] 2025-09-07T07:22:46.0367913Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:22:46.0371561Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=22', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:22:46.036858] 2025-09-07T07:22:52.8610561Z 2025-09-07T07:22:52.8611876Z test_decomp 22/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_22.22_2ed93ff905d16120_.log 2025-09-07T07:22:52.8613079Z Running 0 items in this shard: 2025-09-07T07:22:52.8613421Z 2025-09-07T07:22:52.8617293Z Running dynamo/test_einops 1/1 ... [2025-09-07 07:22:52.861496] 2025-09-07T07:22:52.8617914Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:22:52.8620743Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_einops.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:22:52.861816] 2025-09-07T07:22:56.1310142Z 2025-09-07T07:22:56.1311269Z dynamo/test_einops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_einops_1.1_7716e6027aa51cfb_.log 2025-09-07T07:22:56.1312365Z Running 0 items in this shard: 2025-09-07T07:22:56.1312644Z 2025-09-07T07:22:56.1314736Z Running dynamo/test_callback 1/1 ... [2025-09-07 07:22:56.131270] 2025-09-07T07:22:56.1315334Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:22:56.1318201Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_callback.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:22:56.131585] 2025-09-07T07:23:02.8554156Z 2025-09-07T07:23:02.8555341Z dynamo/test_callback 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_callback_1.1_ba9a7148f2de173e_.log 2025-09-07T07:23:02.8559597Z Running 0 items in this shard: 2025-09-07T07:23:02.8559996Z 2025-09-07T07:23:02.8569014Z Running nn/test_parametrization 1/1 ... [2025-09-07 07:23:02.855531] 2025-09-07T07:23:02.8569483Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:02.8570660Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_parametrization.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:02.855854] 2025-09-07T07:23:06.8261126Z 2025-09-07T07:23:06.8262259Z nn/test_parametrization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_parametrization_1.1_c30eaf00306aca3d_.log 2025-09-07T07:23:06.8263482Z Running 0 items in this shard: 2025-09-07T07:23:06.8263758Z 2025-09-07T07:23:06.8268662Z Running test_masked 1/1 ... [2025-09-07 07:23:06.826551] 2025-09-07T07:23:06.8269090Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:06.8272107Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_masked.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:06.827017] 2025-09-07T07:23:11.6983955Z 2025-09-07T07:23:11.6984907Z test_masked 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_masked_1.1_69abc2fe92cc69bb_.log 2025-09-07T07:23:11.6985924Z Running 0 items in this shard: 2025-09-07T07:23:11.6986216Z 2025-09-07T07:23:11.6990733Z Running export/test_experimental 1/1 ... [2025-09-07 07:23:11.698832] 2025-09-07T07:23:11.6991261Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:11.6994511Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_experimental.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:11.699275] 2025-09-07T07:23:15.1687824Z 2025-09-07T07:23:15.1690611Z export/test_experimental 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_experimental_1.1_702e0979485d122b_.log 2025-09-07T07:23:15.1692060Z Running 0 items in this shard: 2025-09-07T07:23:15.1692363Z 2025-09-07T07:23:15.1694769Z Running nn/test_pruning 1/1 ... [2025-09-07 07:23:15.169249] 2025-09-07T07:23:15.1695505Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:15.1699295Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_pruning.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:15.169759] 2025-09-07T07:23:18.7393203Z 2025-09-07T07:23:18.7394382Z nn/test_pruning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pruning_1.1_465ccf86347b0e53_.log 2025-09-07T07:23:18.7395433Z Running 0 items in this shard: 2025-09-07T07:23:18.7395727Z 2025-09-07T07:23:18.7398079Z Running export/test_converter 1/1 ... [2025-09-07 07:23:18.739635] 2025-09-07T07:23:18.7398707Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:18.7401591Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_converter.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:18.739947] 2025-09-07T07:23:22.3596115Z 2025-09-07T07:23:22.3597243Z export/test_converter 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_converter_1.1_642f1da030332ef4_.log 2025-09-07T07:23:22.3598421Z Running 0 items in this shard: 2025-09-07T07:23:22.3598743Z 2025-09-07T07:23:22.3603090Z Running test_bundled_inputs 1/1 ... [2025-09-07 07:23:22.360055] 2025-09-07T07:23:22.3603570Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:22.3607075Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_bundled_inputs.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:22.360455] 2025-09-07T07:23:25.6797624Z 2025-09-07T07:23:25.6798640Z test_bundled_inputs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_bundled_inputs_1.1_e94f166be58d503b_.log 2025-09-07T07:23:25.6799734Z Running 0 items in this shard: 2025-09-07T07:23:25.6800023Z 2025-09-07T07:23:25.6804241Z Running inductor/test_fxir_backend 1/1 ... [2025-09-07 07:23:25.680191] 2025-09-07T07:23:25.6805074Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:25.6808373Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fxir_backend.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:25.680661] 2025-09-07T07:23:32.6048269Z 2025-09-07T07:23:32.6049568Z inductor/test_fxir_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fxir_backend_1.1_9ba2d728ee052e96_.log 2025-09-07T07:23:32.6050785Z Running 0 items in this shard: 2025-09-07T07:23:32.6051066Z 2025-09-07T07:23:32.6052968Z Running torch_np/numpy_tests/lib/test_histograms 1/1 ... [2025-09-07 07:23:32.605036] 2025-09-07T07:23:32.6053654Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:32.6055696Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_histograms.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:32.605355] 2025-09-07T07:23:35.9746672Z 2025-09-07T07:23:35.9748139Z torch_np/numpy_tests/lib/test_histograms 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_histograms_1.1_6918234645216b41_.log 2025-09-07T07:23:35.9750326Z Running 0 items in this shard: 2025-09-07T07:23:35.9750849Z 2025-09-07T07:23:35.9752898Z Running test_maskedtensor 1/1 ... [2025-09-07 07:23:35.975108] 2025-09-07T07:23:35.9753513Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:35.9756782Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_maskedtensor.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:35.975422] 2025-09-07T07:23:40.9971781Z 2025-09-07T07:23:40.9972644Z test_maskedtensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_maskedtensor_1.1_93f03fb0b8507b53_.log 2025-09-07T07:23:40.9973451Z Running 0 items in this shard: 2025-09-07T07:23:40.9973654Z 2025-09-07T07:23:40.9976678Z Running test_autograd 1/1 ... [2025-09-07 07:23:40.997498] 2025-09-07T07:23:40.9977132Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:40.9980463Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autograd.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:40.997815] 2025-09-07T07:23:46.0694169Z 2025-09-07T07:23:46.0695208Z test_autograd 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autograd_1.1_bb02a26875b229a9_.log 2025-09-07T07:23:46.0696296Z Running 0 items in this shard: 2025-09-07T07:23:46.0696643Z 2025-09-07T07:23:46.0701073Z Running dynamo/test_reorder_logs 1/1 ... [2025-09-07 07:23:46.069864] 2025-09-07T07:23:46.0701571Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:46.0705191Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_reorder_logs.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:46.070304] 2025-09-07T07:23:49.6396874Z 2025-09-07T07:23:49.6398034Z dynamo/test_reorder_logs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_reorder_logs_1.1_641fde8debd9ddf3_.log 2025-09-07T07:23:49.6399217Z Running 0 items in this shard: 2025-09-07T07:23:49.6399513Z 2025-09-07T07:23:49.6400741Z Running dynamo/test_exceptions 1/1 ... [2025-09-07 07:23:49.639899] 2025-09-07T07:23:49.6401406Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:49.6404743Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_exceptions.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:49.640212] 2025-09-07T07:23:53.2597743Z 2025-09-07T07:23:53.2598938Z dynamo/test_exceptions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_exceptions_1.1_01ec39124c013a93_.log 2025-09-07T07:23:53.2600147Z Running 0 items in this shard: 2025-09-07T07:23:53.2600431Z 2025-09-07T07:23:53.2601060Z Running export/test_lift_unlift 1/1 ... [2025-09-07 07:23:53.259919] 2025-09-07T07:23:53.2601696Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:53.2604741Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_lift_unlift.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:53.260214] 2025-09-07T07:23:56.4792858Z 2025-09-07T07:23:56.4794011Z export/test_lift_unlift 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_lift_unlift_1.1_9fd39cd0cf4f883c_.log 2025-09-07T07:23:56.4795157Z Running 0 items in this shard: 2025-09-07T07:23:56.4795434Z 2025-09-07T07:23:56.4798230Z Running test_public_bindings 1/1 ... [2025-09-07 07:23:56.479577] 2025-09-07T07:23:56.4799125Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:56.4801142Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_public_bindings.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:56.479879] 2025-09-07T07:23:59.7490260Z 2025-09-07T07:23:59.7491327Z test_public_bindings 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_public_bindings_1.1_c0402dc0502a17a8_.log 2025-09-07T07:23:59.7492506Z Running 0 items in this shard: 2025-09-07T07:23:59.7492789Z 2025-09-07T07:23:59.7493583Z Running dynamo/test_exc 1/1 ... [2025-09-07 07:23:59.749184] 2025-09-07T07:23:59.7494320Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:23:59.7497576Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_exc.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:23:59.749497] 2025-09-07T07:24:03.4192729Z 2025-09-07T07:24:03.4193644Z dynamo/test_exc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_exc_1.1_cbb87f6c4abb2508_.log 2025-09-07T07:24:03.4194721Z Running 0 items in this shard: 2025-09-07T07:24:03.4195013Z 2025-09-07T07:24:03.4196891Z Running test_sparse_semi_structured 1/1 ... [2025-09-07 07:24:03.419505] 2025-09-07T07:24:03.4197389Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:03.4200173Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse_semi_structured.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:03.419823] 2025-09-07T07:24:10.2938964Z 2025-09-07T07:24:10.2940220Z test_sparse_semi_structured 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_semi_structured_1.1_f8b4e2c5ad29e8c5_.log 2025-09-07T07:24:10.2941447Z Running 0 items in this shard: 2025-09-07T07:24:10.2941748Z 2025-09-07T07:24:10.2942122Z Running dynamo/test_input_attr_tracking 1/1 ... [2025-09-07 07:24:10.294003] 2025-09-07T07:24:10.2942800Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:10.2946307Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_input_attr_tracking.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:10.294371] 2025-09-07T07:24:13.8138476Z 2025-09-07T07:24:13.8139536Z dynamo/test_input_attr_tracking 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_input_attr_tracking_1.1_ceb4639200b7373e_.log 2025-09-07T07:24:13.8140720Z Running 0 items in this shard: 2025-09-07T07:24:13.8141020Z 2025-09-07T07:24:13.8145166Z Running functorch/test_control_flow 1/1 ... [2025-09-07 07:24:13.814289] 2025-09-07T07:24:13.8145949Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:13.8149186Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_control_flow.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:13.814752] 2025-09-07T07:24:18.3357187Z 2025-09-07T07:24:18.3358066Z functorch/test_control_flow 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_control_flow_1.1_17b63c4a7bb8c941_.log 2025-09-07T07:24:18.3358944Z Running 0 items in this shard: 2025-09-07T07:24:18.3359144Z 2025-09-07T07:24:18.3362116Z Running test_matmul_cuda 1/1 ... [2025-09-07 07:24:18.336027] 2025-09-07T07:24:18.3362604Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:18.3365844Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_matmul_cuda.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:18.336330] 2025-09-07T07:24:22.5569628Z 2025-09-07T07:24:22.5570733Z test_matmul_cuda 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_matmul_cuda_1.1_0f6b10fbb2df574e_.log 2025-09-07T07:24:22.5571965Z Running 0 items in this shard: 2025-09-07T07:24:22.5572292Z 2025-09-07T07:24:22.5575058Z Running test_dataloader 1/2 ... [2025-09-07 07:24:22.557312] 2025-09-07T07:24:22.5575658Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:22.5578875Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dataloader.py', '-m', 'serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:22.557628] 2025-09-07T07:24:26.9783614Z 2025-09-07T07:24:26.9784846Z test_dataloader 1/2 was successful, full logs can be found in artifacts with path test/test-reports/test_dataloader_1.2_e8105ef24455da93_.log 2025-09-07T07:24:26.9785989Z Running 0 items in this shard: 2025-09-07T07:24:26.9786268Z 2025-09-07T07:24:26.9789016Z Running test_dataloader 2/2 ... [2025-09-07 07:24:26.978720] 2025-09-07T07:24:26.9789599Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:26.9792622Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dataloader.py', '-m', 'serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:26.979032] 2025-09-07T07:24:31.4501263Z 2025-09-07T07:24:31.4502976Z test_dataloader 2/2 was successful, full logs can be found in artifacts with path test/test-reports/test_dataloader_2.2_8b6b6e1de55568c6_.log 2025-09-07T07:24:31.4504086Z Running 0 items in this shard: 2025-09-07T07:24:31.4504402Z 2025-09-07T07:24:31.4505493Z Running optim/test_swa_utils 1/1 ... [2025-09-07 07:24:31.450367] 2025-09-07T07:24:31.4506094Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:31.4508609Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'optim/test_swa_utils.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:31.450682] 2025-09-07T07:24:34.6196554Z 2025-09-07T07:24:34.6197620Z optim/test_swa_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/optim.test_swa_utils_1.1_571bd17f016c6c37_.log 2025-09-07T07:24:34.6198598Z 2025-09-07T07:24:34.6199686Z Running test_xnnpack_integration 2/4 ... [2025-09-07 07:24:34.619810] 2025-09-07T07:24:34.6200365Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:34.6203728Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_xnnpack_integration.py', '-m', 'serial', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:34.620114] 2025-09-07T07:24:37.9394001Z 2025-09-07T07:24:37.9395055Z test_xnnpack_integration 2/4 was successful, full logs can be found in artifacts with path test/test-reports/test_xnnpack_integration_2.4_6ccc1b0a95335c72_.log 2025-09-07T07:24:37.9396244Z Running 0 items in this shard: 2025-09-07T07:24:37.9397114Z 2025-09-07T07:24:37.9405118Z Running test_xnnpack_integration 4/4 ... [2025-09-07 07:24:37.939859] 2025-09-07T07:24:37.9405647Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:37.9407089Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_xnnpack_integration.py', '-m', 'serial', '--shard-id=4', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:37.940356] 2025-09-07T07:24:41.2095687Z 2025-09-07T07:24:41.2096796Z test_xnnpack_integration 4/4 was successful, full logs can be found in artifacts with path test/test-reports/test_xnnpack_integration_4.4_650baaccada1977b_.log 2025-09-07T07:24:41.2097963Z Running 0 items in this shard: 2025-09-07T07:24:41.2098250Z 2025-09-07T07:24:41.2099433Z Running test_mkldnn 1/1 ... [2025-09-07 07:24:41.209728] 2025-09-07T07:24:41.2100012Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:41.2102784Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkldnn.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:41.210027] 2025-09-07T07:24:45.0300696Z 2025-09-07T07:24:45.0301604Z test_mkldnn 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkldnn_1.1_131669492ccb6c80_.log 2025-09-07T07:24:45.0302637Z Running 0 items in this shard: 2025-09-07T07:24:45.0302998Z 2025-09-07T07:24:45.0316872Z Running test_linalg 2/3 ... [2025-09-07 07:24:45.030502] 2025-09-07T07:24:45.0317297Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:45.0318212Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_linalg.py', '-m', 'serial', '--shard-id=2', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:45.030891] 2025-09-07T07:24:49.8022340Z 2025-09-07T07:24:49.8023367Z test_linalg 2/3 was successful, full logs can be found in artifacts with path test/test-reports/test_linalg_2.3_91d91c3b9a2eb579_.log 2025-09-07T07:24:49.8027727Z Running 6 items in this shard: test/test_linalg.py::TestLinalgCUDA::test_svd_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_svd_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_svd_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_svd_memory_allocation_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_svd_memory_allocation_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_svd_memory_allocation_cuda_float64 2025-09-07T07:24:49.8029586Z 2025-09-07T07:24:49.8029806Z Running test_mkldnn_fusion 1/1 ... [2025-09-07 07:24:49.802745] 2025-09-07T07:24:49.8030258Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:49.8032707Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkldnn_fusion.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:49.803055] 2025-09-07T07:24:53.2225563Z 2025-09-07T07:24:53.2226572Z test_mkldnn_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkldnn_fusion_1.1_8deca3c464750856_.log 2025-09-07T07:24:53.2227701Z Running 0 items in this shard: 2025-09-07T07:24:53.2227995Z 2025-09-07T07:24:53.2230306Z Running test_sparse_csr 1/1 ... [2025-09-07 07:24:53.222902] 2025-09-07T07:24:53.2230749Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:53.2233882Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse_csr.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:53.223186] 2025-09-07T07:24:59.4964302Z 2025-09-07T07:24:59.4965458Z test_sparse_csr 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_csr_1.1_32a29b06b1f067f6_.log 2025-09-07T07:24:59.4967342Z Running 0 items in this shard: 2025-09-07T07:24:59.4967693Z 2025-09-07T07:24:59.4971484Z Running test_type_promotion 1/1 ... [2025-09-07 07:24:59.496910] 2025-09-07T07:24:59.4971859Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:24:59.4975010Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_promotion.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:24:59.497300] 2025-09-07T07:25:03.6179509Z 2025-09-07T07:25:03.6180384Z test_type_promotion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_promotion_1.1_609d120a34cc858e_.log 2025-09-07T07:25:03.6181481Z Running 0 items in this shard: 2025-09-07T07:25:03.6181760Z 2025-09-07T07:25:03.6182146Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T07:25:03.6182863Z Uploading artifacts took 0.00 seconds 2025-09-07T07:25:03.6187306Z Running torch_np/test_reductions 1/1 ... [2025-09-07 07:25:03.618200] 2025-09-07T07:25:03.6187710Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:03.6188673Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_reductions.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:03.618511] 2025-09-07T07:25:07.0880196Z 2025-09-07T07:25:07.0881525Z torch_np/test_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_reductions_1.1_9a8a49c8f896f062_.log 2025-09-07T07:25:07.0882931Z Running 0 items in this shard: 2025-09-07T07:25:07.0883268Z 2025-09-07T07:25:07.0886703Z Running test_dlpack 1/1 ... [2025-09-07 07:25:07.088451] 2025-09-07T07:25:07.0887309Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:07.0890098Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dlpack.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:07.088756] 2025-09-07T07:25:11.1591623Z 2025-09-07T07:25:11.1593224Z test_dlpack 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_dlpack_1.1_584084fbcb1928ac_.log 2025-09-07T07:25:11.1594275Z Running 0 items in this shard: 2025-09-07T07:25:11.1594561Z 2025-09-07T07:25:11.1595351Z Running torch_np/numpy_tests/core/test_scalar_ctors 1/1 ... [2025-09-07 07:25:11.159300] 2025-09-07T07:25:11.1596074Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:11.1598711Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_scalar_ctors.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:11.159606] 2025-09-07T07:25:14.4288027Z 2025-09-07T07:25:14.4289309Z torch_np/numpy_tests/core/test_scalar_ctors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_scalar_ctors_1.1_6d261eef20a72ba2_.log 2025-09-07T07:25:14.4290918Z Running 0 items in this shard: 2025-09-07T07:25:14.4291297Z 2025-09-07T07:25:14.4294747Z Running profiler/test_profiler_tree 1/1 ... [2025-09-07 07:25:14.429296] 2025-09-07T07:25:14.4295254Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:14.4298519Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_profiler_tree.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:14.429636] 2025-09-07T07:25:17.6989937Z 2025-09-07T07:25:17.6991140Z profiler/test_profiler_tree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_profiler_tree_1.1_c1619a004b5e3f3e_.log 2025-09-07T07:25:17.6992573Z Running 0 items in this shard: 2025-09-07T07:25:17.6992928Z 2025-09-07T07:25:17.6996211Z Running test_prims 1/1 ... [2025-09-07 07:25:17.699461] 2025-09-07T07:25:17.6996667Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:17.7000239Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_prims.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:17.699781] 2025-09-07T07:25:22.6212795Z 2025-09-07T07:25:22.6213729Z test_prims 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_prims_1.1_48128be5836bfdd1_.log 2025-09-07T07:25:22.6215015Z Running 0 items in this shard: 2025-09-07T07:25:22.6215254Z 2025-09-07T07:25:22.6217144Z Running test_jit_autocast 1/1 ... [2025-09-07 07:25:22.621564] 2025-09-07T07:25:22.6217583Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:22.6220864Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_jit_autocast.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:22.621878] 2025-09-07T07:25:28.1944181Z 2025-09-07T07:25:28.1945371Z test_jit_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_autocast_1.1_ccae2ced2caa337a_.log 2025-09-07T07:25:28.1946470Z Running 0 items in this shard: 2025-09-07T07:25:28.1946754Z 2025-09-07T07:25:28.1948804Z Running profiler/test_torch_tidy 1/1 ... [2025-09-07 07:25:28.194668] 2025-09-07T07:25:28.1949445Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:28.1951841Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_torch_tidy.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:28.194986] 2025-09-07T07:25:31.5142347Z 2025-09-07T07:25:31.5144348Z profiler/test_torch_tidy 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_torch_tidy_1.1_8e743eee85b4224e_.log 2025-09-07T07:25:31.5145825Z Running 0 items in this shard: 2025-09-07T07:25:31.5146206Z 2025-09-07T07:25:31.5149045Z Running profiler/test_python_tracer 1/1 ... [2025-09-07 07:25:31.514708] 2025-09-07T07:25:31.5149707Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:31.5152787Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_python_tracer.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:31.515010] 2025-09-07T07:25:34.8342832Z 2025-09-07T07:25:34.8343931Z profiler/test_python_tracer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_python_tracer_1.1_a5005fca21102adc_.log 2025-09-07T07:25:34.8345106Z Running 0 items in this shard: 2025-09-07T07:25:34.8345436Z 2025-09-07T07:25:34.8349378Z Running lazy/test_reuse_ir 1/1 ... [2025-09-07 07:25:34.834725] 2025-09-07T07:25:34.8349916Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:34.8353096Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_reuse_ir.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:34.835124] 2025-09-07T07:25:38.1543914Z 2025-09-07T07:25:38.1544910Z lazy/test_reuse_ir 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_reuse_ir_1.1_1b8d85045255914e_.log 2025-09-07T07:25:38.1546579Z Running 0 items in this shard: 2025-09-07T07:25:38.1546864Z 2025-09-07T07:25:38.1547637Z Running test_quantization 1/13 ... [2025-09-07 07:25:38.154586] 2025-09-07T07:25:38.1548242Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:38.1550843Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'serial', '--shard-id=1', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:38.154897] 2025-09-07T07:25:44.4281688Z 2025-09-07T07:25:44.4282605Z test_quantization 1/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_1.13_954d5ff38969e1ff_.log 2025-09-07T07:25:44.4283670Z Running 0 items in this shard: 2025-09-07T07:25:44.4283963Z 2025-09-07T07:25:44.4285998Z Running test_quantization 2/13 ... [2025-09-07 07:25:44.428406] 2025-09-07T07:25:44.4286597Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:44.4289227Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'serial', '--shard-id=2', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:44.428720] 2025-09-07T07:25:50.4015976Z 2025-09-07T07:25:50.4017146Z test_quantization 2/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_2.13_486de18b0add4efe_.log 2025-09-07T07:25:50.4018226Z Running 0 items in this shard: 2025-09-07T07:25:50.4018506Z 2025-09-07T07:25:50.4023017Z Running test_quantization 5/13 ... [2025-09-07 07:25:50.402074] 2025-09-07T07:25:50.4023728Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:50.4027020Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'serial', '--shard-id=5', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:50.402538] 2025-09-07T07:25:56.4254350Z 2025-09-07T07:25:56.4255665Z test_quantization 5/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_5.13_144407d270bc5a51_.log 2025-09-07T07:25:56.4256998Z Running 0 items in this shard: 2025-09-07T07:25:56.4257336Z 2025-09-07T07:25:56.4261443Z Running test_quantization 6/13 ... [2025-09-07 07:25:56.425879] 2025-09-07T07:25:56.4262118Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:25:56.4264410Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'serial', '--shard-id=6', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:25:56.426197] 2025-09-07T07:26:02.3992713Z 2025-09-07T07:26:02.3993773Z test_quantization 6/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_6.13_2b77c0015dc12960_.log 2025-09-07T07:26:02.3994919Z Running 0 items in this shard: 2025-09-07T07:26:02.3995214Z 2025-09-07T07:26:02.3996748Z Running test_quantization 9/13 ... [2025-09-07 07:26:02.399468] 2025-09-07T07:26:02.3997385Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:02.3999776Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'serial', '--shard-id=9', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:02.399786] 2025-09-07T07:26:08.4227269Z 2025-09-07T07:26:08.4228387Z test_quantization 9/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_9.13_d24bdb093e234273_.log 2025-09-07T07:26:08.4229462Z Running 0 items in this shard: 2025-09-07T07:26:08.4229752Z 2025-09-07T07:26:08.4232111Z Running test_quantization 10/13 ... [2025-09-07 07:26:08.423023] 2025-09-07T07:26:08.4232997Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:08.4235538Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'serial', '--shard-id=10', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:08.423361] 2025-09-07T07:26:14.3963316Z 2025-09-07T07:26:14.3964869Z test_quantization 10/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_10.13_51f24b80715ac686_.log 2025-09-07T07:26:14.3966417Z Running 0 items in this shard: 2025-09-07T07:26:14.3966769Z 2025-09-07T07:26:14.3969870Z Running test_quantization 13/13 ... [2025-09-07 07:26:14.396809] 2025-09-07T07:26:14.3970319Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:14.3973332Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'serial', '--shard-id=13', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:14.397119] 2025-09-07T07:26:20.4202019Z 2025-09-07T07:26:20.4203164Z test_quantization 13/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_13.13_a4c0b6037fd0eab5_.log 2025-09-07T07:26:20.4204274Z Running 0 items in this shard: 2025-09-07T07:26:20.4204620Z 2025-09-07T07:26:23.4537489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:26:23.4539092Z import pkg_resources 2025-09-07T07:26:23.5571921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:26:23.5573425Z import pkg_resources 2025-09-07T07:26:23.5682439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:26:23.5683975Z import pkg_resources 2025-09-07T07:26:23.5750195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:26:23.5751697Z import pkg_resources 2025-09-07T07:26:23.6133089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:26:23.6134683Z import pkg_resources 2025-09-07T07:26:23.6164300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:26:23.6165789Z import pkg_resources 2025-09-07T07:26:23.6298004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:26:23.6300040Z import pkg_resources 2025-09-07T07:26:23.6313317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:26:23.6314725Z import pkg_resources 2025-09-07T07:26:24.1235519Z Running inductor/test_aot_inductor 1/1 ... [2025-09-07 07:26:24.123226] 2025-09-07T07:26:24.1236093Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:24.1238402Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:24.123619] 2025-09-07T07:26:24.3036162Z Running inductor/test_triton_extension_backend 1/1 ... [2025-09-07 07:26:24.303424] 2025-09-07T07:26:24.3036711Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:24.3040299Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_extension_backend.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:24.303814] 2025-09-07T07:26:24.3442476Z Running inductor/test_compiled_autograd 2/2 ... [2025-09-07 07:26:24.343965] 2025-09-07T07:26:24.3443052Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:24.3443505Z Running test_comparison_utils 1/1 ... [2025-09-07 07:26:24.344166] 2025-09-07T07:26:24.3443955Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:24.3445475Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_autograd.py', '-m', 'not serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:24.344345] 2025-09-07T07:26:24.3448568Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_comparison_utils.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:24.344546] 2025-09-07T07:26:24.4034729Z Running inductor/test_provenance_tracing 1/1 ... [2025-09-07 07:26:24.403254] 2025-09-07T07:26:24.4035493Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:24.4039063Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_provenance_tracing.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:24.403647] 2025-09-07T07:26:24.4154309Z Running export/test_functionalized_assertions 1/1 ... [2025-09-07 07:26:24.415269] 2025-09-07T07:26:24.4154850Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:24.4158684Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_functionalized_assertions.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:24.415649] 2025-09-07T07:26:24.4159986Z Running test_license 1/1 ... [2025-09-07 07:26:24.415664] 2025-09-07T07:26:24.4160402Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:24.4160851Z Running dynamo/test_base_output 1/1 ... [2025-09-07 07:26:24.415814] 2025-09-07T07:26:24.4161297Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:24.4162792Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_license.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:24.416083] 2025-09-07T07:26:24.4164968Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_base_output.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:24.416242] 2025-09-07T07:26:28.2647513Z 2025-09-07T07:26:28.2648539Z test_comparison_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_comparison_utils_1.1_bb721716171ab1e7_.log 2025-09-07T07:26:28.2652487Z Running 7 items in this shard: test/test_comparison_utils.py::TestComparisonUtils::test_all_equal_no_assert, test/test_comparison_utils.py::TestComparisonUtils::test_all_equal_no_assert_nones, test/test_comparison_utils.py::TestComparisonUtils::test_assert_device, test/test_comparison_utils.py::TestComparisonUtils::test_assert_dtype, test/test_comparison_utils.py::TestComparisonUtils::test_assert_layout, test/test_comparison_utils.py::TestComparisonUtils::test_assert_sizes, test/test_comparison_utils.py::TestComparisonUtils::test_assert_strides 2025-09-07T07:26:28.2655735Z 2025-09-07T07:26:28.2656097Z Running inductor/test_triton_kernels 1/1 ... [2025-09-07 07:26:28.264997] 2025-09-07T07:26:28.2656807Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:28.2657907Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_kernels.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:28.265345] 2025-09-07T07:26:28.3355647Z 2025-09-07T07:26:28.3356744Z export/test_functionalized_assertions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_functionalized_assertions_1.1_8d34832032c116eb_.log 2025-09-07T07:26:28.3359461Z Running 2 items in this shard: test/export/test_functionalized_assertions.py::TestFuntionalAssertions::test_functional_assert_async_msg, test/export/test_functionalized_assertions.py::TestFuntionalAssertions::test_functional_sym_constrain_range 2025-09-07T07:26:28.3361015Z 2025-09-07T07:26:28.3361635Z Running test_mkldnn_verbose 1/1 ... [2025-09-07 07:26:28.335873] 2025-09-07T07:26:28.3362267Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:28.3364251Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkldnn_verbose.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:28.336201] 2025-09-07T07:26:28.4364645Z 2025-09-07T07:26:28.4365387Z test_license 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_license_1.1_f1ed77887135556f_.log 2025-09-07T07:26:28.4367392Z Running 2 items in this shard: test/test_license.py::TestLicense::test_distinfo_license, test/test_license.py::TestLicense::test_license_for_wheel 2025-09-07T07:26:28.4368249Z 2025-09-07T07:26:28.4370857Z Running inductor/test_inductor_utils 1/1 ... [2025-09-07 07:26:28.436933] 2025-09-07T07:26:28.4371299Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:28.4374903Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inductor_utils.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:28.437304] 2025-09-07T07:26:28.4864195Z 2025-09-07T07:26:28.4865075Z dynamo/test_base_output 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_base_output_1.1_92648911cc440d69_.log 2025-09-07T07:26:28.4868336Z Running 6 items in this shard: test/dynamo/test_base_output.py::TestBaseOutput::test_assign, test/dynamo/test_base_output.py::TestBaseOutput::test_create, test/dynamo/test_base_output.py::TestBaseOutput::test_getattr, test/dynamo/test_base_output.py::TestBaseOutput::test_getitem, test/dynamo/test_base_output.py::TestBaseOutput::test_index, test/dynamo/test_base_output.py::TestBaseOutput::test_tuple 2025-09-07T07:26:28.4871040Z 2025-09-07T07:26:28.4871474Z Running inductor/test_flex_decoding 1/1 ... [2025-09-07 07:26:28.486875] 2025-09-07T07:26:28.4872042Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:28.4874906Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_decoding.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:28.487303] 2025-09-07T07:26:31.7287141Z 2025-09-07T07:26:31.7288386Z inductor/test_provenance_tracing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_provenance_tracing_1.1_5487851fb7236509_.log 2025-09-07T07:26:31.7296030Z Running 11 items in this shard: test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_combo_kernel, test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_cpu, test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_cuda, test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_extern_kernel, test/inductor/test_provenance_tracing.py::TestProvenanceTracingNodeMapping::test_create_node_mapping, test/inductor/test_provenance_tracing.py::TestProvenanceTracingNodeMeta::test_pattern_matcher_transfer_meta, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_cpu_extern_kernel, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_create_kernel_information_json_function, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_kernel_information_generation, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_no_kernel_information_without_provenance_tracking, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_tlparse_kernel_stack_traces 2025-09-07T07:26:31.7300986Z 2025-09-07T07:26:31.7301380Z Running cpp_extensions/torch_stable_test_extension/torch_stable_test/test_torch_stable 1/1 ... [2025-09-07 07:26:31.728923] 2025-09-07T07:26:31.7301946Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:31.7303087Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'cpp_extensions/torch_stable_test_extension/torch_stable_test/test_torch_stable.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:31.729249] 2025-09-07T07:26:31.9557269Z 2025-09-07T07:26:31.9558232Z test_mkldnn_verbose 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkldnn_verbose_1.1_22bb3f65be0ed380_.log 2025-09-07T07:26:31.9560050Z Running 2 items in this shard: test/test_mkldnn_verbose.py::TestMKLDNNVerbose::test_verbose_off, test/test_mkldnn_verbose.py::TestMKLDNNVerbose::test_verbose_on 2025-09-07T07:26:31.9561029Z 2025-09-07T07:26:31.9561370Z Running inductor/test_analysis 1/1 ... [2025-09-07 07:26:31.955860] 2025-09-07T07:26:31.9561989Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:31.9564209Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_analysis.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:31.956188] 2025-09-07T07:26:32.5078998Z 2025-09-07T07:26:32.5080351Z inductor/test_inductor_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inductor_utils_1.1_0581817509cfcde5_.log 2025-09-07T07:26:32.5082961Z Running 2 items in this shard: test/inductor/test_inductor_utils.py::TestBench::test_benchmarker, test/inductor/test_inductor_utils.py::TestBench::test_do_bench_using_profiling 2025-09-07T07:26:32.5084114Z 2025-09-07T07:26:32.5084736Z Running test_rename_privateuse1_to_existing_device 1/1 ... [2025-09-07 07:26:32.507886] 2025-09-07T07:26:32.5085661Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:32.5087387Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_rename_privateuse1_to_existing_device.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:32.508313] 2025-09-07T07:26:33.0803697Z 2025-09-07T07:26:33.0805263Z inductor/test_triton_extension_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_extension_backend_1.1_b2d6b2349a6a7aab_.log 2025-09-07T07:26:33.0806787Z Running 0 items in this shard: 2025-09-07T07:26:33.0807081Z 2025-09-07T07:26:33.0807454Z Running inductor/test_cutedsl_template 1/1 ... [2025-09-07 07:26:33.080571] 2025-09-07T07:26:33.0808112Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:33.0811341Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutedsl_template.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:33.080911] 2025-09-07T07:26:35.6494346Z 2025-09-07T07:26:35.6496338Z cpp_extensions/torch_stable_test_extension/torch_stable_test/test_torch_stable 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp_extensions.torch_stable_test_extension.torch_stable_test.test_torch_stable_1.1_cc6b4120313e23ac_.log 2025-09-07T07:26:35.6498406Z Running 1 items in this shard: test/cpp_extensions/torch_stable_test_extension/torch_stable_test/test_torch_stable.py::TestTorchStable::test_setup_fails 2025-09-07T07:26:35.6499106Z 2025-09-07T07:26:35.6499380Z Running inductor/test_ck_backend 1/1 ... [2025-09-07 07:26:35.649588] 2025-09-07T07:26:35.6500291Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:35.6501555Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_ck_backend.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:35.649942] 2025-09-07T07:26:36.0782110Z 2025-09-07T07:26:36.0783800Z test_rename_privateuse1_to_existing_device 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_rename_privateuse1_to_existing_device_1.1_7bf473635419ec94_.log 2025-09-07T07:26:36.0785734Z Running 1 items in this shard: test/test_rename_privateuse1_to_existing_device.py::TestRenamePrivateuseoneToExistingBackend::test_external_module_register_with_existing_backend 2025-09-07T07:26:36.0786453Z 2025-09-07T07:26:36.0786678Z Running inductor/test_memory_planning 1/1 ... [2025-09-07 07:26:36.078330] 2025-09-07T07:26:36.0787093Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:36.0788978Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_memory_planning.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:36.078713] 2025-09-07T07:26:36.6259355Z 2025-09-07T07:26:36.6260501Z inductor/test_compiled_autograd 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_autograd_2.2_1376e9c3e76f8612_.log 2025-09-07T07:26:36.6442764Z Running 416 items in this shard: test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_accuracy, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_5_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_2_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_2_3_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_2_3_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_2_3_3, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_without_zero, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_aot_bwd_gm_runnable, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_basic_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_data_dependent_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_id_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_basic_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_basic_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_dynamic_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_float_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_basic, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_callback_graph_break_throws_error, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_optimize_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_optimize_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_compile_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_optimize_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_optimize_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_optimize_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_division, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_compiled_fw_graph_break, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_non_variable_input, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_output_metadata, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_multiple_tensors_dedup, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_shape_tensor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_with_same_graph, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dont_dce_side_effects, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamic_shapes, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamic_shapes_from_forward, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamo_flaky_segfault, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_free_activation_memory, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_graph_break_custom_op, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_implicit_add, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_inputs_aliasing_bytecode_attr_mutations, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_keep_graph_simple, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_logs, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_logs_aot_bwd_reuse, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_mismatch_fake_tensor_mode, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_mismatch_fake_tensor_mode_dynamic_shape, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_multiple_torch_compile, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_nested_compile, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_nested_context_manager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_no_nested_compiled_autograd, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_no_output_nodes_all_leaves, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_no_output_nodes_different_leaves_will_recompile, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_no_output_nodes_some_leaves, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_output_nodes_some_leaves, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_acc_grad, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_all_bwd_hooks, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_multi_post_hooks, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_post_hook1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_post_hook2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_post_hook3, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook3, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_subclass_basic, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_compile, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_compile_api_dynamic_shapes, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_compile_graph_break, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_compile_graph_break2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_dispatch_mode, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_trace_auto_functionalized, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_trace_auto_functionalized_v2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_aot_id, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_graph, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_tensor_reference, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_grad_warnings, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_attribute_deletion, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_node_isinstance, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_print_tensor, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_python_custom_function_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_to_node, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_retained_graph_without_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_without_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_with_nonleaf_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_with_scalar_input, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_callback_propagates_errors_from_device_thread, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_sequential_warns_if_use_reentrant_not_passed_explcitly, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_warns_if_use_reentrant_not_passed_explcitly, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_non_reentrant_autocast_cpu, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_non_reentrant_autocast_gpu, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_arbitrary_input_output, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_True, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_parameter_used_in_an_out, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_saved_object_identity, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_with_context_fn, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_copy_slices_graph_task_updates, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_current_graph_task_id, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_current_node, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_forward_is_no_op, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_inplace_checks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_view_checks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_inplace_on_non_default_view, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_inplace_on_view_of_leaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_local_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_mark_output_view_of_intermediate, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_no_tensors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_non_tensor_inputs_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_return_view_in_nograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_save_for_forward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_setup_context_multi_input, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_setup_context_multi_output, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_default_saved_tensors_hooks_double_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_detach, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_diagonal_expanded_v, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dir, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dont_materialize_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_free_deep_graph, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_full_backward_hook_double_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_function_returns_input, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_function_returns_undefined_tensor, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gc_in_destructor, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_badcalls, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_attr_bindings, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_prehooks_remove_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_materialize_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_mode_class_decoration, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_mode_restored_reentrant, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_nonleaf_many_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_multi, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_set, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_unreachable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_backward_mul_by_grad_output, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_no_differentiable_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_default_device_placement_context, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad_batched_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_get_analytical_jacobian, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_get_numerical_jacobian, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout0, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout1, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout3, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_jacobian_mismatch, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_multiple_mkldnn_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_nondeterministic, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_single_input, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_undefined_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_validates_input_mkldnn, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradient_edge_graph_ownership, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradient_edge_output, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_graph_save_on_cpu_cuda, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hessian_vector, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_closure_cycle_use_custom_function_False_use_tensor_hook_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_closure_cycle_use_custom_function_False_use_tensor_hook_True, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_closure_cycle_use_custom_function_True_use_tensor_hook_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_closure_cycle_use_custom_function_True_use_tensor_hook_True, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_with_no_name, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_increment_version, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_index_backward_does_not_save_tensor, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_indexing_duplicates, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_saved_output, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_input_buffer_accum, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_invalid_gradients, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_isolated_node, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_leaf_assignment, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_mark_non_differentiable_mixed, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_mark_non_differentiable_none, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_grad_all_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_grad_any_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_grad_hooks_invalid_mode, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multiple_insert_removal_caching, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_naughty_autograd_function_attribute_access, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_nested_anomaly_detect_nan, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad_assignment, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad_copy, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad_copy_sparse, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad_input, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad_modifies_version, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_unnecessary_unwrapping, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_node_ordering_when_none_returned, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_node_post_hook_registered_during_unpack_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_not_implemented_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_numpy_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_once_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_out_variant_raises_when_inputs_require_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_pack_hook_with_inplace_modification_should_fail, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_e2e, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_multiple_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_multiple_tensors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_on_non_leaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_ordering, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_aggregation_fake, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_aggregation_lstm, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_propagation, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_unboxed_only, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_pynode_destruction_deadlock, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_record_function_callbacks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_record_function_legacy, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_record_function_multithreaded, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_priority, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_callbacks_both_depths, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_callbacks_depth_1, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_non_leaf_variable_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_requires_grad_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad_inplace_over_view, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retains_grad_can_always_observe_tensor_prehook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_leaf_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_custom_error_propagation, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_extra_exit_during_bw_no_crash, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensors_hook_version_counter_not_shared, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_default_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_saved_original_with_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_saved_original_inplace_detach, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variables_deprecated, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saving_variable_to_disk, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_select_sum, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_data_preserve_pyobj, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_data_self_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_coroutines, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_coroutines_critical_exceptions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_coroutines_exit, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_enabled, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_generator_functions_recursive, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_setitem, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_setitem_mask, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_setting_default_saved_variable_hooks_twice_should_not_fail, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_setting_default_saved_variable_hooks_twice_should_use_inner, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_setup_context_when_forward_has_default_args, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_simple_reentrant, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_slice_expanded_v, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_dim0, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_dim1, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_x_scalar, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_mm_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_hooks_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_hooks_inplace_over_view, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_to_sparse_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_type_conversions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_unpack_hooks_exec_count, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_unsafe_set_version_counter, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_variable_traverse, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_view_func_replay, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_view_replay_enabled, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_wrapped_number_saved_tensors_hooks, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_kwargs_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_same_graph_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_set_early_stop, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_set_early_stop_no_recompution_needed, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op_with_CompositeImplicitAutograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op_with_meta, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_autogen_aten_ops_are_pt2_compliant, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_autograd_function_backed_op, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_autograd_notimplemented, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_autograd_notimplemented_gradmode, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_dict_invalid_keys, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_dict_requires_keys_for_input_optional_tensors, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_dict_requires_keys_for_input_tensors, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_grads_are_tensor_or_none, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_CompositeImplicitAutograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_incorrect_schema_views, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_with_key_key_Autograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_with_key_key_AutogradCPU, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_output_differentiability_non_tensor, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_output_differentiability_numel, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_output_differentiability_type, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_partially_registered, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_returns_dict, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_tensorlist_input_requires_list_grads, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_tensorlist_input_requires_list_grads_none_or_Tensor, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_builtin_aten_ops_are_pt2_compliant, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_builtin_torchscript_ops, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_data_dependent_compile, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_data_dependent_fake_tracing, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_define_and_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_define_bad_schema, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_define_validation, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_define_with_tags_list, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_define_with_tags_single, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_define_with_tags_tuple, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_functionalize_error, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_cpu, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_device_cuda, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_device_function, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_device_invalid, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_function, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_meta, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_on_existing_op, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_on_existing_op_with_cpu_registration_key_CUDA, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_on_existing_op_with_cpu_registration_key_CompositeExplicitAutograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_incorrect_schema_types, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_infer_schema_no_return, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_lifetime, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_load_library, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_not_implemented_error, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_cea, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_fake, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_reserved_ns, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_resolve_packet, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_save_for_backward_inputs_are_namedtuple, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_schema_matches_signature, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_sequences, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_supported_return_types_multi_return, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_supported_return_types_single_return, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_supported_schemas, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_unsupported_param_types, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_access_module_attr, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_global_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_global_num_adds_guard, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_tracked_nested, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_untracked_global, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_untracked_nonlocal, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_value_created_in_subgraph, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_concat_unbacked_shape_tensor, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_branches_no_arguments_no_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_pytree_operands_with_non_tensor_leaves, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_subgraph_name_is_valid, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_with_empty_operands, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_dynamic_shapes_over_vmap_batch_size, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_enum_arg, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_error_message_sane, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_fallback_on_graph_break_complicated, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_flat_list_output, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_fn_with_kwargs_in_torch_ops, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hints_wrapper, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hints_wrapper_incorrect_type, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hints_wrapper_pytree_inputs, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hooks, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_inlined_functions, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_lift_tensor_constant, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_lift_tensors_with_shared_symbols, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_make_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_example_value_metadata_consistent_with_eager, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_side_effect, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_symint_input, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_modules, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_register_mode, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_same_freevar_twice, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_global_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_nonlocal_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_in_body, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_tensor_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_tensor, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_nonlocal_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_new_attr_global_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_new_attr_nonlocal_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_new_attr_nonlocal_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_support_float_in_output, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_symint_input, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_tensor_and_unbacked_symbol_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_tensor_to_list_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_tensor_with_unbacked_shape_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_all_kwarg, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_default, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_default_if_branch, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_int, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_pytree_args_nested, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_pytree_args_not_const_symint_tensor, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_pytree_args_with_symint_constant, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_subgraph_name_is_valid, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_functional_call, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_functional_call_disable_inline_nn_module, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_capture_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_closure_scalar, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_non_tensor_input, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_over_grad, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_two_tensor_all_grad_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_two_tensor_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd_randomness, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd_two_tensors_argnums, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_freevar_python_scalar, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_jvp, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_two_tensors_disable_enable_disable_grad, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_two_tensors_disable_grad, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_linearize_jvp_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_call_compiled_backward_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_free_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_get_wrapped, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_kwargs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_outputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_outputs_out_dims_tuple, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_new_tensor_implicit_via_op, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_new_tensor_in_body, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_new_tensor_unused_in_body, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_out_dims_None, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_over_vmap_two_inputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_previous_illegal_op_no_graph_break, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile_with_randomness, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_graph_break_2, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_graph_break_lambda, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_cond_with_kwargs, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_cond_with_mismatched_output, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_dropout, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_fallback, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_flop_counter_for_nested_cond, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_function_with_kwargs, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_override_fallthrough_dispatch_key, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_attribute_access_on_intermediate, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_basic, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_contiguous_dtensor_noncontiguous_local_as_tangent, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_noncontiguous_output, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_tp_compile_comm_reordering, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_tp_compile_comm_reordering_graph_partition, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_auto_functionalize_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_flex_attention_backward_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_flex_attention_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_invoke_quant_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_map_triple_nested_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_scan_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_while_loop_stack_output_simple_cuda_float32 2025-09-07T07:26:36.6601903Z 2025-09-07T07:26:36.6602165Z Running export/test_export_with_inline_and_install 1/1 ... [2025-09-07 07:26:36.626890] 2025-09-07T07:26:36.6602605Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:36.6603589Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_export_with_inline_and_install.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:36.627280] 2025-09-07T07:26:37.9654938Z 2025-09-07T07:26:37.9656030Z inductor/test_flex_decoding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_decoding_1.1_83e67b7119dcc47e_.log 2025-09-07T07:26:37.9910558Z Running 572 items in this shard: test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod0_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod0_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod0_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod1_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod1_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod1_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod2_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod2_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod2_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod3_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod3_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod3_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod4_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod4_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod4_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod5_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod5_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod5_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod6_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod6_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod6_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod7_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod7_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod7_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod8_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod8_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod8_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod0_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod0_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod0_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod0_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod1_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod1_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod1_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod1_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod2_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod2_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod2_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod2_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod3_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod3_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod3_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod3_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod4_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod4_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod4_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod4_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod5_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod5_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod5_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod5_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod6_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod6_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod6_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod6_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod7_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod7_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod7_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod7_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod8_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod8_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod8_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod8_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod0_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod0_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod0_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod0_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod1_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod1_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod1_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod1_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod2_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod2_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod2_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod2_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod3_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod3_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod3_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod3_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod4_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod4_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod4_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod4_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod5_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod5_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod5_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod5_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod6_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod6_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod6_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod6_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod7_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod7_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod7_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod7_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod8_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod8_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod8_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod8_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod0_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod0_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod0_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod0_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod1_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod1_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod1_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod1_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod2_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod2_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod2_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod2_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod3_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod3_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod3_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod3_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod4_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod4_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod4_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod4_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod5_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod5_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod5_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod5_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod6_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod6_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod6_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod6_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod7_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod7_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod7_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod7_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod8_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod8_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod8_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod8_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod0_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod0_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod0_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod1_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod1_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod1_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod2_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod2_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod2_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod3_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod3_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod4_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod4_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod4_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod5_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod5_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod5_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod6_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod6_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod6_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod7_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod7_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod7_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod8_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod8_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod8_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod0_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod0_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod0_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod1_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod1_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod1_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod2_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod2_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod2_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod3_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod3_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod3_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod4_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod4_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod4_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod5_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod5_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod5_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod6_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod6_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod6_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod7_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod7_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod7_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod8_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod8_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod8_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_bw_decoding_fails_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_buffers_all_dims_bfloat16_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_buffers_all_dims_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_buffers_all_dims_float32_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_buffers_bfloat16_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_buffers_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_buffers_float32_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_reduction_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_scale_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_do_not_trigger_dynamic_shapes_on_empty_block_mask_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_fully_masked_out_rows_0_check_gqa_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_function_composition_bfloat16_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_function_composition_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_function_composition_float32_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod0_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod0_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod0_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod1_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod1_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod1_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod2_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod2_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod2_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod3_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod3_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod4_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod4_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod4_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod5_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod5_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod5_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod6_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod6_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod6_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod7_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod7_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod7_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod8_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod8_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod8_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_larger_block_mask_bug_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_load_from_bias_head_seq_batch_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_load_from_bias_seq_batch_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_load_from_bias_seq_only_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_correctness_bfloat16_score_mod0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_correctness_bfloat16_score_mod1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_correctness_float16_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_correctness_float16_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_correctness_float32_score_mod0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_correctness_float32_score_mod1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_only_return_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_max_autotune_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_max_autotune_with_captured_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_mixed_dtypes_fails_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_multiple_score_mod_calls2_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_multiple_score_mod_calls_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_multiple_score_mod_calls_paged_attention2_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_multiple_score_mod_calls_paged_attention_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_njt_causal_bfloat16_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_njt_causal_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_njt_causal_float32_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_divisible_multi_token_offset_mask_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_divisible_multi_token_offset_mask_with_captured_buffer_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_divisible_offset_mask_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_divisible_offset_mask_with_captured_buffer_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod0_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod0_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod0_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod0_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod0_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod0_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod1_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod1_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod1_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod1_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod1_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod1_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod2_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod2_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod2_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod2_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod2_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod2_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod3_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod3_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod3_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod3_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod3_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod3_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod4_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod4_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod4_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod4_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod4_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod4_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod6_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod6_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod6_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod6_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod6_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod6_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod7_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod7_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod7_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod7_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod7_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod7_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_sparse_mulitple_block_size_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_not_pw_of_two_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_padded_dense_causal_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_recompile_changed_score_mod_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_seq_masking_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_silu_on_score_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_skip_odd_keys_bfloat16_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_skip_odd_keys_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_skip_odd_keys_float32_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s0_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s0_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s0_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s1_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s1_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s1_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s2_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s2_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s2_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s3_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s3_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s0_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s0_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s0_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s1_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s1_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s1_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s2_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s2_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s2_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s3_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s3_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s0_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s0_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s0_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s1_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s1_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s1_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s2_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s2_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s2_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s3_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s3_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s0_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s0_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s0_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s1_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s1_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s1_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s2_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s2_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s2_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s3_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s3_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_subgraph_respect_decompostion_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_tma_decoding_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_windowed_full_mask_vs_sdpa_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_windowed_full_mask_vs_sdpa_paged_attention_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_windowed_no_mask_vs_sdpa_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_windowed_no_mask_vs_sdpa_paged_attention_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_windowed_partial_block_vs_sdpa_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_windowed_partial_block_vs_sdpa_paged_attention_cuda 2025-09-07T07:26:38.0152180Z 2025-09-07T07:26:38.0152402Z Running dynamo/test_skip_guard_eval_unsafe 1/1 ... [2025-09-07 07:26:37.966803] 2025-09-07T07:26:38.0152807Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:38.0153786Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_skip_guard_eval_unsafe.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:37.967177] 2025-09-07T07:26:39.1955558Z 2025-09-07T07:26:39.1956746Z inductor/test_triton_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_kernels_1.1_a5476b005c34b4e3_.log 2025-09-07T07:26:39.2107539Z Running 361 items in this shard: test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_i64_input, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_layout_constraint_needs_fixed_stride_order, test/inductor/test_triton_kernels.py::KernelTests::test_no_nan_kernels, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_triton_attrs_dict_equal_1_None_format, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching_duplicate, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_constants, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dependancies, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_cpp_wrapper, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_normal, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_fallback, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float16, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float32, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float64, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_functionalize, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_global_constexpr, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_higher_order_func, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inputs_buffer_reuse, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_matmul_tracking, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_not_mark_dirty, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_type, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_none_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_out_of_order, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_reinplace_inplaceable_pass, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_slice_and_view_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input_nonzero_offset, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_various_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_constexpr_function, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol_with_custom_name, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_kernel_param, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop2, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_new_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_old_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_argmax, test/inductor/test_triton_kernels.py::MutationTests::test_branch_with_multiple_yield_args, test/inductor/test_triton_kernels.py::MutationTests::test_cumsum, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_one_return, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg_2, test/inductor/test_triton_kernels.py::MutationTests::test_get_tma_stores, test/inductor/test_triton_kernels.py::MutationTests::test_labels, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_4_times_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_2d_autotuned, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_block_ptr, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_import, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_atomic_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel1, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_false, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_true, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_kernel_with_block_ptr_2d, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_mul2_inplace_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_nested_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel_call, test/inductor/test_triton_kernels.py::MutationTests::test_reduce_sum, test/inductor/test_triton_kernels.py::MutationTests::test_triton_kernel_inference_mode, test/inductor/test_triton_kernels.py::MutationTests::test_while_loop, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_no_pre_or_post_hook_user_defined, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_unbacked, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_meta, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_mutable_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_triton_kernel, test/inductor/test_triton_kernels.py::CustomOpTests::test_subclass, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_dynamic_grid_no_recompile, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_inductor, test/inductor/test_triton_kernels.py::CustomOpTests::test_wrap_triton_disabled_in_triton_op 2025-09-07T07:26:39.2239159Z 2025-09-07T07:26:39.2239390Z Running inductor/test_inplace_padding 1/1 ... [2025-09-07 07:26:39.196201] 2025-09-07T07:26:39.2239841Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:39.2240832Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inplace_padding.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:39.196528] 2025-09-07T07:26:41.9850510Z 2025-09-07T07:26:41.9851937Z inductor/test_analysis 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_analysis_1.1_4127cb28189a0ba5_.log 2025-09-07T07:26:41.9865065Z Running 28 items in this shard: test/inductor/test_analysis.py::TestUtils::test_tabulate2d, test/inductor/test_analysis.py::TestUtils::test_zip_dicts, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat1_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat1_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat2_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat2_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat3_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat3_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_helper_unit_cuda, test/inductor/test_analysis.py::TestAnalysisCUDA::test_combine_profiles_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_combine_profiles_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float64, test/inductor/test_analysis.py::TestAnalysisCUDA::test_noop_cuda, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat1_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat1_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat2_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat2_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat3_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat3_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float64 2025-09-07T07:26:41.9874307Z 2025-09-07T07:26:41.9874536Z Running dynamo/test_buffers_override 1/1 ... [2025-09-07 07:26:41.985018] 2025-09-07T07:26:41.9874924Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:41.9875886Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_buffers_override.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:41.985352] 2025-09-07T07:26:41.9876758Z 2025-09-07T07:26:41.9877280Z dynamo/test_skip_guard_eval_unsafe 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_skip_guard_eval_unsafe_1.1_9e508917a017bb6e_.log 2025-09-07T07:26:41.9879317Z Running 5 items in this shard: test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_bool_recompile, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_cache_line_pickup, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_fail_on_tensor_shape_change, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_post_recompile, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_tensor_recompile 2025-09-07T07:26:41.9880959Z 2025-09-07T07:26:41.9881128Z Running test_custom_ops 1/1 ... [2025-09-07 07:26:41.987690] 2025-09-07T07:26:41.9881548Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:41.9882490Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_custom_ops.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:41.988048] 2025-09-07T07:26:43.2595103Z 2025-09-07T07:26:43.2596217Z inductor/test_cutedsl_template 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutedsl_template_1.1_5fb13460ef69065f_.log 2025-09-07T07:26:43.2602803Z Running 13 items in this shard: test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cse_integration, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e_autotune, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_op_overrides, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_defines, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_get_output_hook, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_indented_buffer_usage, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_modification_subgraph, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_multiple_templates_unique_names, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_render_includes_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_aliasing, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_env_contains_hooks 2025-09-07T07:26:43.2607343Z 2025-09-07T07:26:43.2607551Z Running inductor/test_b2b_gemm 1/1 ... [2025-09-07 07:26:43.259517] 2025-09-07T07:26:43.2607952Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:43.2609168Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_b2b_gemm.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:43.259866] 2025-09-07T07:26:45.3062784Z 2025-09-07T07:26:45.3064150Z inductor/test_memory_planning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_memory_planning_1.1_846122ed4ea701bd_.log 2025-09-07T07:26:45.3067403Z Running 4 items in this shard: test/inductor/test_memory_planning.py::TestMemoryPlanning::test_aoti, test/inductor/test_memory_planning.py::TestMemoryPlanning::test_cpp_wrapper, test/inductor/test_memory_planning.py::TestMemoryPlanning::test_python_wrapper, test/inductor/test_memory_planning.py::TestMemoryPlanning::test_unbacked_symint 2025-09-07T07:26:45.3069558Z 2025-09-07T07:26:45.3069860Z Running functorch/test_ac_logging 1/1 ... [2025-09-07 07:26:45.306277] 2025-09-07T07:26:45.3070427Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:45.3071790Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ac_logging.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:45.306706] 2025-09-07T07:26:45.7051243Z 2025-09-07T07:26:45.7052398Z dynamo/test_buffers_override 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_buffers_override_1.1_6b873a43ef2402dc_.log 2025-09-07T07:26:45.7054737Z Running 2 items in this shard: test/dynamo/test_buffers_override.py::TestBuffersOverride::test_buffers_override, test/dynamo/test_buffers_override.py::TestBuffersOverride::test_named_buffers_override 2025-09-07T07:26:45.7055658Z 2025-09-07T07:26:45.7055963Z Running inductor/test_inductor_annotations 1/1 ... [2025-09-07 07:26:45.705090] 2025-09-07T07:26:45.7056552Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:45.7068374Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inductor_annotations.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:45.705456] 2025-09-07T07:26:46.1794541Z 2025-09-07T07:26:46.1795871Z inductor/test_ck_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_ck_backend_1.1_a24c2ec7af46c6f7_.log 2025-09-07T07:26:46.1818912Z Running 34 items in this shard: test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_addmm_max_autotune_gemm_backends_ATen,Triton,CK_x_shape0, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_addmm_max_autotune_gemm_backends_ATen,Triton,CK_x_shape1, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_addmm_max_autotune_gemm_backends_ATen,Triton,CK_x_shape2, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_addmm_max_autotune_gemm_backends_CK_x_shape0, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_addmm_max_autotune_gemm_backends_CK_x_shape1, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_addmm_max_autotune_gemm_backends_CK_x_shape2, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_conv2d_max_autotune_conv_backends_ATEN,CK,TRITON, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_conv2d_max_autotune_conv_backends_CK, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_bmm_max_autotune_gemm_backends_ATen,Triton,CK, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_bmm_max_autotune_gemm_backends_CK, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_dynamic_max_autotune_gemm_backends_CK_autotune_in_subproc_True, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_ATen,Triton,CK_autotune_in_subproc_False_use_aoti_False, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_ATen,Triton,CK_autotune_in_subproc_False_use_aoti_True, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_ATen,Triton,CK_autotune_in_subproc_True_use_aoti_False, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_ATen,Triton,CK_autotune_in_subproc_True_use_aoti_True, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_CKTILE_autotune_in_subproc_False_use_aoti_False, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_CKTILE_autotune_in_subproc_False_use_aoti_True, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_CKTILE_autotune_in_subproc_True_use_aoti_False, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_CKTILE_autotune_in_subproc_True_use_aoti_True, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_CK_autotune_in_subproc_False_use_aoti_False, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_CK_autotune_in_subproc_False_use_aoti_True, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_CK_autotune_in_subproc_True_use_aoti_False, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_matmul_max_autotune_gemm_backends_CK_autotune_in_subproc_True_use_aoti_True, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_non_contiguous_max_autotune_gemm_backends_Aten,CK, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_preselected_max_autotune_gemm_backends_ATen,Triton,CK, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_precompile_preselected_max_autotune_gemm_backends_CK, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_scaled_mm_max_autotune_gemm_backends_ATen,Triton,CK_quantize_type_rowwise_has_bias_False, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_scaled_mm_max_autotune_gemm_backends_ATen,Triton,CK_quantize_type_rowwise_has_bias_True, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_scaled_mm_max_autotune_gemm_backends_ATen,Triton,CK_quantize_type_tensorwise_has_bias_False, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_scaled_mm_max_autotune_gemm_backends_ATen,Triton,CK_quantize_type_tensorwise_has_bias_True, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_scaled_mm_max_autotune_gemm_backends_CK_quantize_type_rowwise_has_bias_False, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_scaled_mm_max_autotune_gemm_backends_CK_quantize_type_rowwise_has_bias_True, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_scaled_mm_max_autotune_gemm_backends_CK_quantize_type_tensorwise_has_bias_False, test/inductor/test_ck_backend.py::TestCKBackend::test_max_autotune_scaled_mm_max_autotune_gemm_backends_CK_quantize_type_tensorwise_has_bias_True 2025-09-07T07:26:46.1834640Z 2025-09-07T07:26:46.1834812Z Running dynamo/test_resume 1/1 ... [2025-09-07 07:26:46.179585] 2025-09-07T07:26:46.1835199Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:46.1836143Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_resume.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:46.179941] 2025-09-07T07:26:47.3603499Z 2025-09-07T07:26:47.3605355Z test_custom_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_custom_ops_1.1_957a7f8260ab64ec_.log 2025-09-07T07:26:47.3674106Z Running 280 items in this shard: test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op, test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op_with_CompositeExplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op_with_CompositeImplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op_with_meta, test/test_custom_ops.py::TestCustomOp::test_autogen_aten_ops_are_pt2_compliant, test/test_custom_ops.py::TestCustomOp::test_autograd_function_backed_op, test/test_custom_ops.py::TestCustomOp::test_autograd_notimplemented, test/test_custom_ops.py::TestCustomOp::test_autograd_notimplemented_gradmode, test/test_custom_ops.py::TestCustomOp::test_backward_dict_grad_for_nontensor, test/test_custom_ops.py::TestCustomOp::test_backward_dict_invalid_keys, test/test_custom_ops.py::TestCustomOp::test_backward_dict_requires_keys_for_input_optional_tensors, test/test_custom_ops.py::TestCustomOp::test_backward_dict_requires_keys_for_input_tensors, test/test_custom_ops.py::TestCustomOp::test_backward_grads_are_tensor_or_none, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_CompositeImplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_incorrect_schema_mutable, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_incorrect_schema_no_output, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_incorrect_schema_views, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_with_key_key_Autograd, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_with_key_key_AutogradCPU, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_with_key_key_AutogradCUDA, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_non_tensor, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_numel, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_tensorlist, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_type, test/test_custom_ops.py::TestCustomOp::test_backward_partially_registered, test/test_custom_ops.py::TestCustomOp::test_backward_returns_dict, test/test_custom_ops.py::TestCustomOp::test_backward_tensorlist_input_requires_list_grads, test/test_custom_ops.py::TestCustomOp::test_backward_tensorlist_input_requires_list_grads_none_or_Tensor, test/test_custom_ops.py::TestCustomOp::test_backward_tensorlist_input_requires_list_grads_with_same_numel, test/test_custom_ops.py::TestCustomOp::test_basic_make_fx, test/test_custom_ops.py::TestCustomOp::test_builtin_aten_ops_are_pt2_compliant, test/test_custom_ops.py::TestCustomOp::test_builtin_torchscript_ops, test/test_custom_ops.py::TestCustomOp::test_data_dependent_basic, test/test_custom_ops.py::TestCustomOp::test_data_dependent_compile, test/test_custom_ops.py::TestCustomOp::test_data_dependent_fake_tracing, test/test_custom_ops.py::TestCustomOp::test_data_dependent_nms_dynamic_compile, test/test_custom_ops.py::TestCustomOp::test_define_and_impl, test/test_custom_ops.py::TestCustomOp::test_define_bad_schema, test/test_custom_ops.py::TestCustomOp::test_define_validation, test/test_custom_ops.py::TestCustomOp::test_define_with_tags_list, test/test_custom_ops.py::TestCustomOp::test_define_with_tags_single, test/test_custom_ops.py::TestCustomOp::test_define_with_tags_tuple, test/test_custom_ops.py::TestCustomOp::test_defined_in_python, test/test_custom_ops.py::TestCustomOp::test_duplicate_impl, test/test_custom_ops.py::TestCustomOp::test_functionalize_error, test/test_custom_ops.py::TestCustomOp::test_impl_abstract_overload, test/test_custom_ops.py::TestCustomOp::test_impl_cpu, test/test_custom_ops.py::TestCustomOp::test_impl_device_cpu, test/test_custom_ops.py::TestCustomOp::test_impl_device_cuda, test/test_custom_ops.py::TestCustomOp::test_impl_device_function, test/test_custom_ops.py::TestCustomOp::test_impl_device_invalid, test/test_custom_ops.py::TestCustomOp::test_impl_function, test/test_custom_ops.py::TestCustomOp::test_impl_invalid_devices, test/test_custom_ops.py::TestCustomOp::test_impl_meta, test/test_custom_ops.py::TestCustomOp::test_impl_multiple, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CPU, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CUDA, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CompositeExplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CompositeImplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_impl_separate, test/test_custom_ops.py::TestCustomOp::test_incorrect_schema_types, test/test_custom_ops.py::TestCustomOp::test_infer_schema_no_return, test/test_custom_ops.py::TestCustomOp::test_infer_schema_supported, test/test_custom_ops.py::TestCustomOp::test_infer_schema_unsupported, test/test_custom_ops.py::TestCustomOp::test_invalid_qualname, test/test_custom_ops.py::TestCustomOp::test_invalid_schemas, test/test_custom_ops.py::TestCustomOp::test_is_functional_schema, test/test_custom_ops.py::TestCustomOp::test_is_tensorlist_like_type, test/test_custom_ops.py::TestCustomOp::test_legacy_define, test/test_custom_ops.py::TestCustomOp::test_legacy_impl, test/test_custom_ops.py::TestCustomOp::test_lifetime, test/test_custom_ops.py::TestCustomOp::test_load_library, test/test_custom_ops.py::TestCustomOp::test_meta_for_data_dependent_shape_operation, test/test_custom_ops.py::TestCustomOp::test_name_must_match, test/test_custom_ops.py::TestCustomOp::test_new_data_dependent_symint, test/test_custom_ops.py::TestCustomOp::test_not_implemented_error, test/test_custom_ops.py::TestCustomOp::test_override_cea, test/test_custom_ops.py::TestCustomOp::test_override_fake, test/test_custom_ops.py::TestCustomOp::test_override_impl, test/test_custom_ops.py::TestCustomOp::test_override_meta, test/test_custom_ops.py::TestCustomOp::test_private_ctor, test/test_custom_ops.py::TestCustomOp::test_reserved_ns, test/test_custom_ops.py::TestCustomOp::test_resolve_packet, test/test_custom_ops.py::TestCustomOp::test_save_for_backward_inputs_are_namedtuple, test/test_custom_ops.py::TestCustomOp::test_schema_matches_signature, test/test_custom_ops.py::TestCustomOp::test_sequences, test/test_custom_ops.py::TestCustomOp::test_supported_param_types, test/test_custom_ops.py::TestCustomOp::test_supported_return_types_multi_return, test/test_custom_ops.py::TestCustomOp::test_supported_return_types_single_return, test/test_custom_ops.py::TestCustomOp::test_supported_schemas, test/test_custom_ops.py::TestCustomOp::test_symints, test/test_custom_ops.py::TestCustomOp::test_unsupported_param_types, test/test_custom_ops.py::TestCustomOp::test_unsupported_schemas, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_inplace, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_inplace, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_inplace, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_dont_generate, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_inplace, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_inplace, test/test_custom_ops.py::MiniOpTest::test_mm, test/test_custom_ops.py::MiniOpTest::test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_nonzero, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_aten_mm, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_aten_nonzero, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_aten_sin_, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_mini_op_test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_mini_op_test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_mini_op_test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_schema__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_schema__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_schema__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_schema__test_inplace, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_schema__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_schema__test_nonzero, test/test_custom_ops.py::TestCustomOpAPI::test_any_output_is_alias_to_input_or_output, test/test_custom_ops.py::TestCustomOpAPI::test_any_requires_grad, test/test_custom_ops.py::TestCustomOpAPI::test_basic, test/test_custom_ops.py::TestCustomOpAPI::test_compile, test/test_custom_ops.py::TestCustomOpAPI::test_default_values, test/test_custom_ops.py::TestCustomOpAPI::test_disallows_output_aliasing, test/test_custom_ops.py::TestCustomOpAPI::test_factory_function, test/test_custom_ops.py::TestCustomOpAPI::test_fake, test/test_custom_ops.py::TestCustomOpAPI::test_kwarg_only_tensors, test/test_custom_ops.py::TestCustomOpAPI::test_layout_constraint_tags, test/test_custom_ops.py::TestCustomOpAPI::test_library_get_kernel, test/test_custom_ops.py::TestCustomOpAPI::test_library_get_kernel_invalid, test/test_custom_ops.py::TestCustomOpAPI::test_library_get_kernel_with_conditional_dispatch, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_list_input, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_multiple_times, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_multiple_times_different_devices, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autograd, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autograd_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_0, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_1, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_2, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_3, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_4, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_5, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_kernel, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_kernel_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch_rule_mode, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch_rule_subclass, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_library_decorator, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_op_decorator, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_register_multiple_times, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_register_multiple_times_2, test/test_custom_ops.py::TestCustomOpAPI::test_library_schema_infer, test/test_custom_ops.py::TestCustomOpAPI::test_manual_schema, test/test_custom_ops.py::TestCustomOpAPI::test_manual_schema_error, test/test_custom_ops.py::TestCustomOpAPI::test_multi_types, test/test_custom_ops.py::TestCustomOpAPI::test_mutated, test/test_custom_ops.py::TestCustomOpAPI::test_mutated_error, test/test_custom_ops.py::TestCustomOpAPI::test_mutated_unknown, test/test_custom_ops.py::TestCustomOpAPI::test_no_grad_skips_autograd, test/test_custom_ops.py::TestCustomOpAPI::test_overloading, test/test_custom_ops.py::TestCustomOpAPI::test_register_autograd_defaults, test/test_custom_ops.py::TestCustomOpAPI::test_register_autograd_error_cases, test/test_custom_ops.py::TestCustomOpAPI::test_register_autograd_kwargonly_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_register_vmap_defaults, test/test_custom_ops.py::TestCustomOpAPI::test_register_vmap_kwargonly_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_replacement, test/test_custom_ops.py::TestCustomOpAPI::test_set_kernel_enabled, test/test_custom_ops.py::TestCustomOpAPI::test_split_device, test/test_custom_ops.py::TestCustomOpAPI::test_supports_tensorlist, test/test_custom_ops.py::MiniOpTestOther::test_aot_dispatch_dynamic__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_aot_dispatch_static__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_autograd_registration__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_faketensor__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_aten_mm, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_aten_nonzero, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_aten_sin_, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_mini_op_test_delayed_error, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_mini_op_test_incorrect_schema, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_mini_op_test_no_abstract, test/test_custom_ops.py::MiniOpTestOther::test_schema__test_nonzero_again, test/test_custom_ops.py::TestGenerateOpcheckTests::test_MiniOpTest, test/test_custom_ops.py::TestGenerateOpcheckTests::test_dont_generate_decorator, test/test_custom_ops.py::TestGenerateOpcheckTests::test_failures_dict_validation, test/test_custom_ops.py::TestGenerateOpcheckTests::test_generate_repro_no_save_data, test/test_custom_ops.py::TestGenerateOpcheckTests::test_generate_repro_save_data, test/test_custom_ops.py::TestGenerateOpcheckTests::test_is_inside_opcheck_mode, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck_bad_op, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck_customopdef, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck_does_not_require_extra_deps, test/test_custom_ops.py::TestTypeConversion::test_mixed_types, test/test_custom_ops.py::TestTypeConversion::test_optional, test/test_custom_ops.py::TestTypeConversion::test_simple_tuple, test/test_custom_ops.py::TestTypeConversion::test_supported_types, test/test_custom_ops.py::TestOpProfiles::test_duplicate_registration_custom_op, test/test_custom_ops.py::TestOpProfiles::test_duplicate_registration_impl, test/test_custom_ops.py::TestOpProfiles::test_fake_registration, test/test_custom_ops.py::TestOpProfiles::test_save_to_file, test/test_custom_ops.py::TestOpProfiles::test_version, test/test_custom_ops.py::TestOpProfiles::test_yaml, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_False_dynamic_False_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_False_dynamic_True_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_auto_dynamic_False_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_auto_dynamic_True_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_assert_raises_regex_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registered_at_backend_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_autograd_kernel_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_compositeimplicitautograd_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_incorrect_composite_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_incorrect_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_global_state_mutation_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_incorrect_abstract_impl_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_incorrect_schema_mutation_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_incorrect_schema_view_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_missing_abstract_impl_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_missing_functionalization_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_fails_basic_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyCatCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyCubeCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyMulCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyMulScalarCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyNMSCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyNonzeroCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpySortCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpySplitCopyCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpySplitCopyWithIntCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyTakeCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyViewCopyCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_unbacked_stride_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_single_element_tuple_output_cuda 2025-09-07T07:26:47.3749111Z 2025-09-07T07:26:47.3749364Z Running inductor/test_template_heuristics_registry 1/1 ... [2025-09-07 07:26:47.360995] 2025-09-07T07:26:47.3749816Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:47.3750817Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_template_heuristics_registry.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:47.361360] 2025-09-07T07:26:48.1234361Z 2025-09-07T07:26:48.1235466Z inductor/test_inplace_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inplace_padding_1.1_b717db97728a8f80_.log 2025-09-07T07:26:48.1240825Z Running 8 items in this shard: test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel_max_autotune, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_input, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_output, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero_cpp_wrapper, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_too_large, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_due_to_fusion, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_input 2025-09-07T07:26:48.1244350Z 2025-09-07T07:26:48.1244567Z Running inductor/test_debug_trace 1/1 ... [2025-09-07 07:26:48.123511] 2025-09-07T07:26:48.1244973Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:48.1245976Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_debug_trace.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:48.123828] 2025-09-07T07:26:48.9266688Z 2025-09-07T07:26:48.9268069Z functorch/test_ac_logging 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ac_logging_1.1_291b5233aa2f4d35_.log 2025-09-07T07:26:48.9270692Z Running 4 items in this shard: test/functorch/test_ac_logging.py::TestAcLogging::test_create_activation_checkpointing_logging_structure_payload, test/functorch/test_ac_logging.py::TestAcLogging::test_create_joint_graph_edges, test/functorch/test_ac_logging.py::TestAcLogging::test_create_joint_graph_node_information, test/functorch/test_ac_logging.py::TestAcLogging::test_create_structured_trace_for_min_cut_info 2025-09-07T07:26:48.9272624Z 2025-09-07T07:26:48.9273165Z Running test_ao_sparsity 1/1 ... [2025-09-07 07:26:48.926795] 2025-09-07T07:26:48.9273663Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:48.9274851Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ao_sparsity.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:48.927153] 2025-09-07T07:26:49.8495760Z 2025-09-07T07:26:49.8496695Z dynamo/test_resume 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_resume_1.1_3c82f7d41d013424_.log 2025-09-07T07:26:49.8498101Z Running 1 items in this shard: test/dynamo/test_resume.py::ResumeFunctionTests::test_freevars 2025-09-07T07:26:49.8498684Z 2025-09-07T07:26:49.8499029Z Running inductor/test_async_compile 1/1 ... [2025-09-07 07:26:49.849683] 2025-09-07T07:26:49.8499664Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:49.8502551Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_async_compile.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:49.849992] 2025-09-07T07:26:50.5611507Z 2025-09-07T07:26:50.5612912Z export/test_export_with_inline_and_install 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_export_with_inline_and_install_1.1_7dbbdfab4fe3e30e_.log 2025-09-07T07:26:50.5829121Z Running 416 items in this shard: test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestDynamismExpression::test_export_assume_static_by_default_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestDynamismExpression::test_export_constraints_error_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestDynamismExpression::test_export_constraints_error_not_in_range_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestDynamismExpression::test_export_inline_constraints_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestDynamismExpression::test_export_slice_maxsize_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestDynamismExpression::test_export_slice_unbacked_dim1_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestDynamismExpression::test_export_strict_narrow_unbacked_expr_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestDynamismExpression::test_no_grad_param_inplace_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestDynamismExpression::test_reshape_view_backed_size_oblivious_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test__scaled_dot_product_flash_attention_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_additional_inputs_constants_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_allow_explicit_guards_as_runtime_asserts_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_args_type_checked_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_aten_lift_fresh_copy_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_attention_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_attr_assignment_extra_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_automatic_constrain_size_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_automatic_dynamic_shapes_constant_relation_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_automatic_dynamic_shapes_linear_relation_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_automatic_dynamic_shapes_simple_equality_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_baddbmm_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_basic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_basic_non_strict_fake_tensor_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_basic_non_strict_real_tensor_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_bincount_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_buffer_util_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_capture_subclass_constructor_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_capture_subclass_constructor_torch_ir_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_capture_subclass_wrong_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_ccode_python_mod_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_check_specialized_int_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_checks_to_constrain_range_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_cleanup_dynamic_markers_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_colin_unbacked_backed_vr_sub_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_colon_parameter_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_compiling_state_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_cond_access_identical_symint_closure_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_cond_branches_return_constant_int_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_cond_branches_return_same_int_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_cond_buffers_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_cond_contains_unbacked_no_escape_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_cond_int_closure_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_cond_unflatten_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_cond_with_module_stack_export_with_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_cond_with_module_stack_export_with_unflatten_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constant_aliasing_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constant_input_naming_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constant_no_user_inp_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constant_output_dup_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constant_output_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constant_requires_grad_const_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constant_return_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constant_tensor_mutation_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constant_tensor_with_non_functional_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constant_tensor_with_non_functional_nested_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constrain_decomp_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constrain_size_in_eager_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constrain_size_with_constrain_value_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_constrain_size_with_various_cases_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_conv_dynamic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_crop_like_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_cse_for_symint_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_custom_op_auto_functionalize_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_custom_op_auto_functionalize_pre_dispatch_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_custom_op_auto_warn_pre_dispatch_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_custom_op_preserve_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_custom_pytree_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_custom_tag_metadata_re_export_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_decomp_batch_norm_functional_predispatch_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_decomp_item_in_prim_after_decomposition_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_decomp_item_in_prim_before_decomposition_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_default_decomposition_core_cia_ops_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_derived_dim_1_2_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_derived_dim_basic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_derived_dim_integer_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_derived_dim_nested_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_derived_dim_out_of_order_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_derived_dim_out_of_order_repeat_derived_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_derived_dim_out_of_order_simplified_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_derived_dim_repeat_derived_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_detect_leak_nonstrict_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_detect_leak_nonstrict_with_stacktrace_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_detect_leak_strict_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_device_to_dynamic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_device_to_gpu_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_device_to_mutation_float_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_device_to_mutation_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_device_to_static_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dim_1_2_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dim_auto_and_dim_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dim_dynamic_divisibility_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dim_dynamic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dim_dynamic_specialization_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dim_hint_range_violations_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dim_hint_ranges_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_disable_forced_specializations_errors_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_disable_forced_specializations_ok_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_distributed_all_gather_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_distributed_all_gather_into_tensor_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_distributed_all_reduce_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_distributed_all_to_all_single_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_distributed_reduce_scatter_tensor_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dont_duck_size_for_auto_dynamic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_double_lifted_constants_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_draft_export_checks_aliasing_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_draft_export_checks_mutation_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_draft_export_checks_mutation_list_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_draft_export_checks_mutation_with_nan_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_draft_export_fake_kernel_inference_errors_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_draft_export_infers_fake_kernel_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_duplicate_modules_with_non_persistent_buffers_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_lr_shift_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_shapes_bounds_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_shapes_builder_basic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_shapes_builder_kwargs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_shapes_builder_pytree_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_shapes_dataclass_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_shapes_inferred_basic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_shapes_serdes_generic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_shapes_serdes_user_errors_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_shapes_serdes_various_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_shapes_spec_with_pytree_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_dynamic_sym_round_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_ends_of_bounds_oblivious_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_error_does_not_reference_eager_fallback_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_error_when_passing_mutating_primitive_op_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_exception_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_expand_copy_export_handles_implicit_true_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_api_with_dynamic_shapes_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_as_backend_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_associative_scan_lifted_buffers_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_associative_scan_symbol_dim_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_associative_scan_symbol_scandim_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_aten_to_unflatten_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_aten_to_unflatten_subclass_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_cond_symbool_pred_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_cond_warns_constant_pred_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_custom_decomp_table_basic_pop_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_custom_decomp_table_container_methods_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_custom_op_lib_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_custom_triton_kernel_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_custom_triton_kernel_mutable_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_cyclic_reference_leak_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_decomp_torture_case_1_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_decomp_torture_case_2_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_decomps_dynamic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_decomps_simple_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_dynamo_config_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_for_training_run_decomp_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_for_training_with_container_type_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_for_training_with_dynamic_shapes_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_for_training_with_mutation_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_for_training_with_state_dict_hooks_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_func_with_default_kwargs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_func_with_keyword_only_args_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_func_with_kwargs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_func_with_pytree_kwargs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_func_with_var_keyword_args_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_func_with_var_keyword_pytree_args_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_func_with_var_postional_args_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_function_schema_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_graph_with_no_inputs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_input_mutation_bug_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_input_mutation_dynamic_shape_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_input_mutation_static_shape_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_linear_preserve_dynamic_shape_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_max_nonstrict_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_max_onnx_reported_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_method_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_mod_constraints_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_module_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_preserve_linear_at_aot_level_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_preserve_linear_but_not_custom_op_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_scan_pytree_output_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_script_module_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_statically_known_true_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_then_compile_tensor_ctor_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_with_autocast_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_with_fake_tensor_inputs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_with_inline_constraints_complex_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_with_inline_constraints_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_with_set_grad_enabled_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_export_with_wrong_inputs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_external_call_non_strict_real_tensor_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_fake_inputs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_fake_weights_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_filter_traceback_frames_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_float_conversion_from_int_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_float_conversion_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_fqn_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_from_node_metadata_export_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_full_on_scalar_tensor_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_function_holding_tensor_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_hints_wrapper_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_hoo_inline_users_issue_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_if_functional_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_if_post_autograd_op_preserved_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_inline_script_class_method_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_inline_script_class_method_recursive_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_inline_script_function_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_inline_script_method_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_int_shape_specialization_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_intermediate_shape_comp_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_is_exporting_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_is_non_negative_check_function_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_is_nonzero_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_isnonzero_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_issue_113041_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_issue_157289_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_istft_op_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_keep_composite_ops_invalid_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_keep_composite_ops_linear_convd_for_training_ir_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_keep_composite_ops_linear_convd_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_kwarg_dynamic_shapes_diff_order_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_kwargs_reorder_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_layer_norm_unbacked_normalized_shape_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_layer_sharing_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_lazy_module_kwargs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_lifted_constants_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_linear_conv_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_malformed_fqn_from_source_name_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_map_buffers_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_map_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_mask_nonzero_static_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_masked_select_dynamic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_math_pow_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_mismatched_dynamic_shapes_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_mixed_input_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_module_dict_key_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_module_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_module_input_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_module_input_subclasses_parameterization_nested_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_module_list_slice_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_module_with_dict_container_inp_out_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_modules_access_for_deleted_submodule_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_more_multidimensional_slicing_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_multidimensional_slicing_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_multinomial_dynamic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_multiple_definitions_same_name_dim_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_nested_dynamic_shapes_spec_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_nested_module_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_nested_module_with_constant_buffer_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_nested_module_with_init_buffer_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_nested_module_with_parameter_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_nn_module_stack_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_nn_module_stack_shared_submodule_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_no_check_is_size_error_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_no_suggested_fixes_for_data_dependent_errors_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_no_tensor_computation_2_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_no_tensor_computation_3_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_no_tensor_computation_4_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_no_tensor_computation_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_non_arg_name_dynamic_shapes_api_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_non_persistent_buffer_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_non_strict_dynamic_shapes_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_non_strict_dynamic_shapes_suggested_fixes_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_none_buffers_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_nonstrict_retrace_preserves_metadata_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_nonzero_2_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_nonzero_dynamic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_not_registered_parameter_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_operator_aten_tensor_mode_variant_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_output_node_name_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_pad_sequence_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_param_util_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_partial_patched_forward_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_placeholder_naming_collisions_hoo_subgraphs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_placeholder_naming_collisions_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_placeholder_naming_order_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_placeholder_naming_order_variadic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_placeholder_update_preserving_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_predispatch_cond_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_predispatch_grad_wrappers_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_preserve_module_call_signature_unflatten_specialization_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_preserve_requires_grad_placeholders_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_preserve_shape_dynamism_for_unused_inputs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_profiling_code_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_python_asserts_with_sym_int_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_pytree_register_data_class_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_pytree_register_nested_data_class_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_range_constraints_with_replacement_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_real_tensor_alias_dtype_mismatch_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_real_tensor_bool_cast_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_real_tensor_errors_on_aliasing_custom_op_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_real_tensor_for_max_op_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_real_tensor_size_mismatch_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_redundant_assert_max_upper_bound_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_redundant_asserts_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_refine_dynamic_shapes_from_suggested_fixes_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_register_constant_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_repeat_interleave_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_replace_unbacked_with_very_large_upperbound_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_replaced_unbacked_bindings_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_reshape_view_helper_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_retracable_ep_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_retrace_pre_autograd_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_run_decomposition_supports_user_input_mutation_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_run_decompositions_keep_metadata_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_run_decompositions_keep_tensor_constant_metadata_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_runtime_assert_for_prim_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_runtime_assert_for_prm_str_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_runtime_assert_with_size_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_sdpa_gqa_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_sequential_slicing_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_set_example_inputs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_set_grad_as_side_effect_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_set_grad_empty_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_set_grad_unflatten_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_setgrad_lifted_tensor_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_shared_submodule_nn_module_stack_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_simple_export_for_training_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_simple_unbacked_view_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_size_input_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_slice_nn_module_stack_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_solver_unsupported_sympy_function_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_specialize_derived_dim_roots_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_split_const_gm_with_lifted_constants_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_stack_trace_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_stack_trace_make_fx_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_state_primitives_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_state_shape_attribute_assignment_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_state_tensors_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_static_dim_constraints_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_subclass_nested_attr_access_complicated_metadata_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_subclass_nested_attr_access_const_metadata_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_subclass_nested_attr_access_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_subclass_nested_attr_access_submodule_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_subclasses_parameterization_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_subclasses_parameterization_nested_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_suggest_torch_checks_with_non_negative_check_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_suggest_torch_checks_with_regular_check_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_suggested_fixes_for_data_dependent_errors_basic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_suggested_fixes_new_roots_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_sym_float_operators_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_sym_or_sym_and_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_sym_sqrt_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_symbool_item_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_symfloat_item_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_symint_input_additional_inputs_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_symint_input_basic_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_symint_input_ranges_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_symint_input_shapes_collection_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_symint_input_specialization_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_symint_item_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_symint_output_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_symint_tensor_return_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_tensor_attribute_zero_args_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_tensor_constant_aten_to_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_tensor_constant_with_wrapped_method_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_to_module_with_mutated_buffer_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_to_module_with_mutated_buffer_multiple_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_tolist_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_torch_check_eq_commutativity_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_torch_fn_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_trace_under_fake_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_train_eval_on_exported_preautograd_module_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_3d_matmul_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_bincount_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_bindings_for_divisible_u_symint_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_deferred_runtime_retrace_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_expand_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_infer_size_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_kth_value_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_linear_layer_norm_input_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_noncontig_lin_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_pad_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_scalar_constructor_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_slice_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_to_cond_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_to_cond_passthrough_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unbacked_unsqueeze_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_asserts_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_buffer_update_child2parent_swap_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_closure_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_isinstance_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_multiple_graphs_dispatch_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_multiple_graphs_shared_submodule_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_multiple_graphs_state_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_no_unroll_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_placeholder_update_child2parent_swap_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_5_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_6_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_buf_8_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_const_preserving_3_1_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_const_preserving_3_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_mutating_buf_4_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_mutating_buf_6_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_mutating_buf_9_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unflatten_random_dag_preserving_4_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unused_aliases_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_unused_constant_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_use_embedding_twice_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_user_input_and_buffer_mutation_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_vmap_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_while_loop_assert_separation_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_while_loop_index_assertions_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_while_loop_simple_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_while_loop_tensor_constant_idx_inline_and_install_strict, test/export/test_export_with_inline_and_install.py::InlineAndInstallStrictExportTestExport::test_wrapper_module_inline_and_install_strict 2025-09-07T07:26:50.6038850Z 2025-09-07T07:26:50.6039017Z Running dynamo/test_nops 1/1 ... [2025-09-07 07:26:50.562091] 2025-09-07T07:26:50.6039378Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:50.6040275Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_nops.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:50.562470] 2025-09-07T07:26:51.4351740Z 2025-09-07T07:26:51.4352766Z inductor/test_b2b_gemm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_b2b_gemm_1.1_23efdeaaa01cb217_.log 2025-09-07T07:26:51.4353907Z Running 0 items in this shard: 2025-09-07T07:26:51.4354194Z 2025-09-07T07:26:51.4355075Z Running torch_np/test_nep50_examples 1/1 ... [2025-09-07 07:26:51.435314] 2025-09-07T07:26:51.4355679Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:51.4358970Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_nep50_examples.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:51.435698] 2025-09-07T07:26:52.7839252Z 2025-09-07T07:26:52.7840593Z inductor/test_template_heuristics_registry 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_template_heuristics_registry_1.1_0426c40d73b09961_.log 2025-09-07T07:26:52.7844456Z Running 5 items in this shard: test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_assertion_existing_class, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_fallback_behavior, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_hierarchy_lookup, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_partial_hierarchy_scenarios, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_register_class 2025-09-07T07:26:52.7847294Z 2025-09-07T07:26:52.7847562Z Running torch_np/test_binary_ufuncs 1/1 ... [2025-09-07 07:26:52.784037] 2025-09-07T07:26:52.7848052Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:52.7849291Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_binary_ufuncs.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:52.784419] 2025-09-07T07:26:53.0300843Z 2025-09-07T07:26:53.0302012Z inductor/test_inductor_annotations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inductor_annotations_1.1_d38029ae8c6284de_.log 2025-09-07T07:26:53.0304971Z Running 2 items in this shard: test/inductor/test_inductor_annotations.py::InductorAnnotationTestCase::test_no_annotations, test/inductor/test_inductor_annotations.py::InductorAnnotationTestCase::test_training_annotation 2025-09-07T07:26:53.0306171Z 2025-09-07T07:26:53.0306389Z Running inductor/test_best_config 1/1 ... [2025-09-07 07:26:53.030296] 2025-09-07T07:26:53.0306799Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:53.0309324Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_best_config.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:53.030729] 2025-09-07T07:26:53.1978202Z 2025-09-07T07:26:53.1979487Z test_ao_sparsity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_ao_sparsity_1.1_15285edfa91c1284_.log 2025-09-07T07:26:53.2011038Z Running 88 items in this shard: test/test_ao_sparsity.py::TestQuantizedSparseKernels::test_sparse_qlinear, test/test_ao_sparsity.py::TestQuantizedSparseLayers::test_sparse_qlinear, test/test_ao_sparsity.py::TestQuantizedSparseLayers::test_sparse_qlinear_serdes, test/test_ao_sparsity.py::TestFakeSparsity::test_jit_trace, test/test_ao_sparsity.py::TestFakeSparsity::test_masking_logic, test/test_ao_sparsity.py::TestFakeSparsity::test_state_dict_preserved, test/test_ao_sparsity.py::TestFakeSparsity::test_weights_parametrized, test/test_ao_sparsity.py::TestCubicScheduler::test_constructor, test/test_ao_sparsity.py::TestCubicScheduler::test_step, test/test_ao_sparsity.py::TestScheduler::test_constructor, test/test_ao_sparsity.py::TestScheduler::test_lambda_scheduler, test/test_ao_sparsity.py::TestScheduler::test_order_of_steps, test/test_ao_sparsity.py::TestScheduler::test_step, test/test_ao_sparsity.py::TestBaseSparsifier::test_constructor, test/test_ao_sparsity.py::TestBaseSparsifier::test_convert, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params1, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params2, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params3, test/test_ao_sparsity.py::TestBaseSparsifier::test_prepare_config, test/test_ao_sparsity.py::TestBaseSparsifier::test_state_dict, test/test_ao_sparsity.py::TestBaseSparsifier::test_step, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_constructor, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_prepare, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_sparsity_levels, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_step, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_constructor, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_prepare, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_sparsity_levels, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_step, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_step_2_of_4, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_complex_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_constructor, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prepare_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prepare_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_activation_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_bias_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_padding_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_pool_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_activation_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_bias_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_layernorm_linear_multiple_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_layernorm_linear_single_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_linear_multiple_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_linear_single_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_step_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_step_linear, test/test_ao_sparsity.py::TestFPGMPruner::test_compute_distance, test/test_ao_sparsity.py::TestFPGMPruner::test_update_mask, test/test_ao_sparsity.py::TestSaliencyPruner::test_lstm_saliency_pruner_update_mask, test/test_ao_sparsity.py::TestSaliencyPruner::test_saliency_pruner_update_mask, test/test_ao_sparsity.py::TestComposability::test_convert_without_squash_mask, test/test_ao_sparsity.py::TestComposability::test_fusion_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_q_prep_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_qat_prep_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_fusion, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_q_prep, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_qat_prep, test/test_ao_sparsity.py::TestFxComposability::test_q_prep_fx_before_s_prep, test/test_ao_sparsity.py::TestFxComposability::test_q_prep_fx_s_prep_ref_conv, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_before_q_prep_fx, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_before_qat_prep_fx, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_q_prep_fx_ref, test/test_ao_sparsity.py::TestActivationSparsifier::test_activation_sparsifier, test/test_ao_sparsity.py::TestBaseDataScheduler::test_constructor, test/test_ao_sparsity.py::TestBaseDataScheduler::test_order_of_steps, test/test_ao_sparsity.py::TestBaseDataScheduler::test_state_dict, test/test_ao_sparsity.py::TestBaseDataScheduler::test_step, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_nn_embeddings, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_nn_parameters, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_tensors, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_nn_embeddings, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_nn_parameters, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_tensors, test/test_ao_sparsity.py::TestQuantizationUtils::test_ptq_quantize_first, test/test_ao_sparsity.py::TestQuantizationUtils::test_ptq_sparsify_first, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module_for_tensors, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_get_arg_info_from_tensor_fqn, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_get_arg_info_from_tensor_fqn_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn_root 2025-09-07T07:26:53.2032099Z 2025-09-07T07:26:53.2032264Z Running test_hop_infra 1/1 ... [2025-09-07 07:26:53.198075] 2025-09-07T07:26:53.2032708Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:53.2033601Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_hop_infra.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:53.198423] 2025-09-07T07:26:54.4825161Z 2025-09-07T07:26:54.4826395Z dynamo/test_nops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_nops_1.1_f7e8ddbce4d875b2_.log 2025-09-07T07:26:54.4828381Z Running 4 items in this shard: test/dynamo/test_nops.py::NopTests::test1, test/dynamo/test_nops.py::NopTests::test2, test/dynamo/test_nops.py::NopTests::test3, test/dynamo/test_nops.py::NopTests::test_extended_args 2025-09-07T07:26:54.4829487Z 2025-09-07T07:26:54.4829781Z Running torch_np/test_unary_ufuncs 1/1 ... [2025-09-07 07:26:54.482589] 2025-09-07T07:26:54.4830341Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:54.4831778Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_unary_ufuncs.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:54.482926] 2025-09-07T07:26:55.8990692Z 2025-09-07T07:26:55.8991876Z inductor/test_debug_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_debug_trace_1.1_437e4d65deb82ad1_.log 2025-09-07T07:26:55.8993650Z Running 3 items in this shard: test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_multi_tempalte, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_printer_const, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_trace 2025-09-07T07:26:55.8994784Z 2025-09-07T07:26:55.8995072Z Running inductor/test_aot_inductor_package 1/1 ... [2025-09-07 07:26:55.899077] 2025-09-07T07:26:55.8995574Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:55.8996779Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_package.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:55.899412] 2025-09-07T07:26:56.6042505Z 2025-09-07T07:26:56.6044115Z torch_np/test_binary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_binary_ufuncs_1.1_c8263689a951c0b4_.log 2025-09-07T07:26:56.6056638Z Running 38 items in this shard: test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_add, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_arctan2, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_bitwise_and, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_bitwise_or, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_bitwise_xor, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_copysign, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_divide, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_equal, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_float_power, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_floor_divide, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_fmax, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_fmin, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_fmod, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_gcd, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_greater, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_greater_equal, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_heaviside, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_hypot, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_lcm, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_ldexp, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_left_shift, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_less, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_less_equal, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_logaddexp, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_logaddexp2, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_logical_and, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_logical_or, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_logical_xor, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_matmul, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_maximum, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_minimum, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_multiply, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_nextafter, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_not_equal, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_power, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_remainder, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_right_shift, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_subtract 2025-09-07T07:26:56.6067324Z 2025-09-07T07:26:56.6067497Z Running inductor/test_pad_mm 1/1 ... [2025-09-07 07:26:56.604372] 2025-09-07T07:26:56.6067867Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:56.6068787Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_pad_mm.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:56.604670] 2025-09-07T07:26:56.9084775Z 2025-09-07T07:26:56.9085562Z torch_np/test_nep50_examples 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_nep50_examples_1.1_987b387d6577c892_.log 2025-09-07T07:26:56.9645671Z Running 1573 items in this shard: test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_3j + array(3, complex64), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_True + uint8(2), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array(1_0, float32) + 1e-14 == 1_0, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([0_1], float32) == float64(0_1), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([100], uint8) + 200, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + 1, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + 200, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + 300, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + array(1, int64), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + int64(1), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_0], float32) + 1e-14 == 1_0, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + 3, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + array(1_, float64), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + float64(1_), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + int64(3), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_bool_(True) + 1, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_float32(1) + 1j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_float32(1) + 3e100, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_float32(5) + 5j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_int16(2) + 2, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_int16(4) + 4j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_int32(1) + 5j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_uint8(1) + 2, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_uint8(1) + 300, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_uint8(100) + 200, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar27_array27_dtype27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar28_array28_dtype28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar29_array29_dtype29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar30_array30_dtype30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar31_array31_dtype31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar32_array32_dtype32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar33_array33_dtype33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar34_array34_dtype34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar35_array35_dtype35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array10_dtype10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array11_dtype11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array12_dtype12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array13_dtype13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array14_dtype14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array15_dtype15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array16_dtype16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array17_dtype17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array9_dtype9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array18_dtype18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array19_dtype19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array20_dtype20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array21_dtype21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array22_dtype22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array23_dtype23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array24_dtype24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array25_dtype25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array26_dtype26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array0_dtype0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array1_dtype1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array2_dtype2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array3_dtype3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array4_dtype4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array5_dtype5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array6_dtype6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array7_dtype7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array8_dtype8 2025-09-07T07:26:57.0181278Z 2025-09-07T07:26:57.0181505Z Running typing/test_python_operators 1/1 ... [2025-09-07 07:26:56.911243] 2025-09-07T07:26:57.0181904Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:57.0182835Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'typing/test_python_operators.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:56.911615] 2025-09-07T07:26:57.4752447Z 2025-09-07T07:26:57.4754577Z inductor/test_async_compile 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_async_compile_1.1_49e80b6ac6ee0667_.log 2025-09-07T07:26:57.4760240Z Running 8 items in this shard: test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_fork, test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_spawn, test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_subprocess, test/inductor/test_async_compile.py::TestAsyncCompile::test_bad_kernel, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_fork, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_spawn, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_subprocess, test/inductor/test_async_compile.py::TestAsyncCompile::test_wait_pool_ready 2025-09-07T07:26:57.4764245Z 2025-09-07T07:26:57.4764631Z Running inductor/test_aot_inductor_custom_ops 1/1 ... [2025-09-07 07:26:57.475336] 2025-09-07T07:26:57.4765309Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:57.4766910Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_custom_ops.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:57.475728] 2025-09-07T07:26:57.6695535Z 2025-09-07T07:26:57.6696783Z test_hop_infra 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_hop_infra_1.1_726454e4af7ece82_.log 2025-09-07T07:26:57.6698338Z Running 3 items in this shard: test/test_hop_infra.py::TestHOPInfra::test_all_hops_are_imported, test/test_hop_infra.py::TestHOPInfra::test_all_hops_have_opinfo, test/test_hop_infra.py::TestHOPInfra::test_imports_from_all_work 2025-09-07T07:26:57.6699335Z 2025-09-07T07:26:57.6699605Z Running inductor/test_cudagraph_trees 1/1 ... [2025-09-07 07:26:57.669656] 2025-09-07T07:26:57.6700094Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:57.6703236Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cudagraph_trees.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:57.670061] 2025-09-07T07:26:58.2526699Z 2025-09-07T07:26:58.2527693Z torch_np/test_unary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_unary_ufuncs_1.1_f35297146532e8d5_.log 2025-09-07T07:26:58.2543131Z Running 42 items in this shard: test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_absolute, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_arccos, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_arccosh, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_arcsin, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_arcsinh, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_arctan, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_arctanh, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_cbrt, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_ceil, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_conjugate, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_cos, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_cosh, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_deg2rad, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_degrees, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_exp, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_exp2, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_expm1, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_fabs, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_floor, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_isfinite, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_isinf, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_isnan, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_log, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_log10, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_log1p, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_log2, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_logical_not, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_negative, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_positive, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_rad2deg, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_radians, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_reciprocal, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_rint, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_sign, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_signbit, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_sin, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_sinh, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_sqrt, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_square, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_tan, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_tanh, test/torch_np/test_unary_ufuncs.py::TestUnaryUfuncs::test_trunc 2025-09-07T07:26:58.2553345Z 2025-09-07T07:26:58.2553559Z Running inductor/test_compile_worker 1/1 ... [2025-09-07 07:26:58.252714] 2025-09-07T07:26:58.2553976Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:26:58.2554956Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile_worker.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:26:58.253050] 2025-09-07T07:27:00.2048056Z 2025-09-07T07:27:00.2049476Z inductor/test_best_config 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_best_config_1.1_6acc09cbd32890e8_.log 2025-09-07T07:27:00.2051212Z Running 1 items in this shard: test/inductor/test_best_config.py::TestKernelBestConfig::test_best_config_has_triton_cache_key 2025-09-07T07:27:00.2052042Z 2025-09-07T07:27:00.2052350Z Running dynamo/test_modules 1/1 ... [2025-09-07 07:27:00.204986] 2025-09-07T07:27:00.2052969Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:00.2056289Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_modules.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:00.205368] 2025-09-07T07:27:01.3823493Z 2025-09-07T07:27:01.3824593Z typing/test_python_operators 1/1 was successful, full logs can be found in artifacts with path test/test-reports/typing.test_python_operators_1.1_fcb7f31958d4263f_.log 2025-09-07T07:27:01.3913510Z Running 318 items in this shard: test/typing/test_python_operators.py::TestPythonOperators::test_binary_a100_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a101_op_%_b101, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a102_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a103_op_%_b103, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a104_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a105_op_*_b105, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a106_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a107_op_*_b107, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a108_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a109_op_**_b109, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a110_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a111_op_**_b111, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a112_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a113_op_+_b113, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a114_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a115_op_+_b115, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a116_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a117_op_-_b117, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a118_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a119_op_-_b119, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a120_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a121_op_/_b121, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a122_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a123_op_/_b123, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a124_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a125_op_//_b125, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a126_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a127_op_//_b127, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a128_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a129_op_&_b129, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a130_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a131_op_&_b131, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a132_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a133_op_<<_b133, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a134_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a135_op_<<_b135, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a136_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a137_op_>>_b137, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a138_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a139_op_>>_b139, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a140_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a141_op_^_b141, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a142_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a143_op_^_b143, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a144_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a145_op_|_b145, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a146_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a147_op_|_b147, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a148_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a149_op_@_b149, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a150_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a151_op_@_b151, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a228_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a229_op_!=_b229, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a230_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a231_op_!=_b231, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a232_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a233_op_<_b233, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a234_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a235_op_<_b235, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a236_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a237_op_<=_b237, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a238_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a239_op_<=_b239, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a240_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a241_op_==_b241, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a242_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a243_op_==_b243, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a244_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a245_op_>_b245, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a246_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a247_op_>_b247, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a248_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a249_op_>=_b249, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a250_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a251_op_>=_b251, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a252_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a253_op_%_b253, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a254_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a255_op_%_b255, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a256_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a257_op_*_b257, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a258_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a259_op_*_b259, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a260_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a261_op_**_b261, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a262_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a263_op_**_b263, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a264_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a265_op_+_b265, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a266_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a267_op_+_b267, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a268_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a269_op_-_b269, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a270_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a271_op_-_b271, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a272_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a273_op_/_b273, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a274_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a275_op_/_b275, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a276_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a277_op_//_b277, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a278_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a279_op_//_b279, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a280_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a281_op_&_b281, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a282_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a283_op_&_b283, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a284_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a285_op_<<_b285, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a286_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a287_op_<<_b287, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a288_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a289_op_>>_b289, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a290_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a291_op_>>_b291, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a292_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a293_op_^_b293, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a294_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a295_op_^_b295, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a296_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a297_op_|_b297, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a298_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a299_op_|_b299, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a300_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a301_op_@_b301, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a302_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a303_op_@_b303, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a76_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a77_op_!=_b77, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a78_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a79_op_!=_b79, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a80_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a81_op_<_b81, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a82_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a83_op_<_b83, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a84_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a85_op_<=_b85, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a86_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a87_op_<=_b87, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a88_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a89_op_==_b89, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a90_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a91_op_==_b91, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a92_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a93_op_>_b93, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a94_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a95_op_>_b95, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a96_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a97_op_>=_b97, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a98_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a99_op_>=_b99, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b1, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b25, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b27, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b53, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b55, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b33, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b35, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b29, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b31, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b37, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b39, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b41, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b43, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b49, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b51, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b45, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b47, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b57, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b59, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b11, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b9, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b7, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b13, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b15, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b21, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b23, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b61, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b63, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b17, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b19, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b73, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b75, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b65, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b67, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b69, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b71, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b153, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b155, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b177, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b179, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b205, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b207, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b185, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b187, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b181, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b183, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b189, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b191, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b193, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b195, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b201, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b203, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b197, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b199, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b209, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b211, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b161, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b163, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b157, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b159, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b165, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b167, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b173, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b175, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b213, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b215, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b169, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b171, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b225, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b227, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b217, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b219, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b221, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b223, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_operators_are_correct_and_complete, test/typing/test_python_operators.py::TestPythonOperators::test_type_tests_are_complete, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a1, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a_3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a7, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a_3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a11, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a9, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a_3 2025-09-07T07:27:01.3997688Z 2025-09-07T07:27:01.3997980Z Running test_transformers 1/1 ... [2025-09-07 07:27:01.382989] 2025-09-07T07:27:01.3998361Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:01.3999287Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_transformers.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:01.383357] 2025-09-07T07:27:03.2743979Z 2025-09-07T07:27:03.2745140Z inductor/test_aot_inductor_package 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_package_1.1_ceeccbe34ebaa296_.log 2025-09-07T07:27:03.2777245Z Running 88 items in this shard: test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_update_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_update_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_update_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_update_weights 2025-09-07T07:27:03.2806634Z 2025-09-07T07:27:03.2806813Z Running dynamo/test_global 1/1 ... [2025-09-07 07:27:03.274533] 2025-09-07T07:27:03.2807172Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:03.2808162Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_global.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:03.274860] 2025-09-07T07:27:04.5305768Z 2025-09-07T07:27:04.5306761Z inductor/test_pad_mm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_pad_mm_1.1_8fed8c950b69835f_.log 2025-09-07T07:27:04.5313363Z Running 19 items in this shard: test/inductor/test_pad_mm.py::PadMMTest::test_cat_pad_mm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_exclude_cat_padding, test/inductor/test_pad_mm.py::PadMMTest::test_exclude_padding, test/inductor/test_pad_mm.py::PadMMTest::test_no_autocast_in_pad_bmm_joint_graph_pass, test/inductor/test_pad_mm.py::PadMMTest::test_original_aten_preserved_pad_mm, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_2d_bias, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_dyn_mn, test/inductor/test_pad_mm.py::PadMMTest::test_pad_batch, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_b, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_bm, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_k, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_bf16, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_k, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_mnk, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_n, test/inductor/test_pad_mm.py::PadMMTest::test_pad_single_cat, test/inductor/test_pad_mm.py::PadMMTest::test_zero_dim 2025-09-07T07:27:04.5318216Z 2025-09-07T07:27:04.5318438Z Running export/test_export 1/1 ... [2025-09-07 07:27:04.530738] 2025-09-07T07:27:04.5318868Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:04.5319976Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_export.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:04.531098] 2025-09-07T07:27:06.4531524Z 2025-09-07T07:27:06.4533168Z inductor/test_aot_inductor_custom_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_custom_ops_1.1_a632191e34e86c87_.log 2025-09-07T07:27:06.4551357Z Running 35 items in this shard: test/inductor/test_aot_inductor_custom_ops.py::AOTInductorLoggingTest::test_shape_env_reuse, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_boxed_run_inputs_clearing_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_add_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_add_output_path_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_all_inputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_missing_arg_with_default_value_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_out_variant_without_return_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_return_list_of_single_tensor_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_return_single_tensor_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_square_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_with_concat_inputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_with_multiple_outputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_with_reinterpret_view_inputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_int_output_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_optional_tensor_nullopt_output_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_optional_tensor_output_2_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_optional_tensor_output_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_incorrect_custom_op_schema_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_boxed_run_inputs_clearing_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_custom_op_add_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_custom_op_add_output_path_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_custom_op_all_inputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_custom_op_missing_arg_with_default_value_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_custom_op_out_variant_without_return_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_custom_op_return_list_of_single_tensor_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_custom_op_return_single_tensor_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_custom_op_square_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_custom_op_with_concat_inputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_custom_op_with_multiple_outputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_custom_op_with_reinterpret_view_inputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_fn_with_int_output_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_fn_with_optional_tensor_nullopt_output_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_fn_with_optional_tensor_output_2_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_fn_with_optional_tensor_output_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCuda::test_incorrect_custom_op_schema_cuda 2025-09-07T07:27:06.4565407Z 2025-09-07T07:27:06.4565549Z Running test_foreach 1/1 ... [2025-09-07 07:27:06.453292] 2025-09-07T07:27:06.4565900Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:06.4566780Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_foreach.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:06.453673] 2025-09-07T07:27:07.3954209Z 2025-09-07T07:27:07.3955191Z dynamo/test_global 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_global_1.1_d549229dee1f74d6_.log 2025-09-07T07:27:07.3960522Z Running 12 items in this shard: test/dynamo/test_global.py::TestGlobals::test_store_global_1, test/dynamo/test_global.py::TestGlobals::test_store_global_2, test/dynamo/test_global.py::TestGlobals::test_store_global_cross_file, test/dynamo/test_global.py::TestGlobals::test_store_global_crossfile_inline, test/dynamo/test_global.py::TestGlobals::test_store_global_dict, test/dynamo/test_global.py::TestGlobals::test_store_global_dict_2, test/dynamo/test_global.py::TestGlobals::test_store_global_inline_1, test/dynamo/test_global.py::TestGlobals::test_store_global_inline_2, test/dynamo/test_global.py::TestGlobals::test_store_global_list, test/dynamo/test_global.py::TestGlobals::test_store_global_list_2, test/dynamo/test_global.py::TestGlobals::test_store_global_new, test/dynamo/test_global.py::TestGlobals::test_store_global_object 2025-09-07T07:27:07.3965380Z 2025-09-07T07:27:07.3965756Z Running test_appending_byte_serializer 1/1 ... [2025-09-07 07:27:07.395626] 2025-09-07T07:27:07.3966429Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:07.3967906Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_appending_byte_serializer.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:07.395993] 2025-09-07T07:27:08.3995042Z 2025-09-07T07:27:08.3996017Z inductor/test_cudagraph_trees 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cudagraph_trees_1.1_23296f5f37410a76_.log 2025-09-07T07:27:08.4051226Z Running 159 items in this shard: test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_accumulate_grad, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_accumulate_multiple_recordings, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_alias_of_parameter, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_output_checkpoint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_static_parameter, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_storage_single_weakref, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliasing_static_ref, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_amp_cache_disabled, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_backward_gets_cached_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cache_hit_forward_miss_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cached_boxed_forward_device_index, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cached_forward_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_checkpoint_shared_output_storage_deallocation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_checkpointing_resets_persistent_refs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cleanup, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_compiled_autograd_static_input_params, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_constant_output, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_conv_benchmark, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cpp_wrapper, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_or_error, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_dynamic_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_dynamic_warmup, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_empty_cpu_tensor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_empty_storage, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_end_recording_early, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_error_on_dealloc_use, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_error_on_dealloc_use2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_execution_into_recording, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_expanded_inputs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_due_to_cudagraph_managed_tensor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_warn_only_once, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_generation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_with_skipped_cudagraphed_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_frozen_fn, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_function_compiled_multiple_times, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_buffer_reuse, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_condition_op, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_only, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_op_and_dynamic_shapes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar3, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar4, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_device_put, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_multiple, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_mutation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_tensor_symints, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_dynamoc_shapes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_mutation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_no_split, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_dynamic_scalar_inputs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_dynamic_shapes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_foreach_op, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_backward_not_called, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_with_skipped_cudagraphed_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_fused_scheduler_node, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_gc, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_item, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_log_message, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_multiple_devices_msg, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reduce_overhead_mode_effectiveness, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu_interleave, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_simple, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_cat_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_from_mutation_index, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_from_nested_indirect_indexing, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint_multi_output_layout, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_item, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_backend, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_graph_breaks, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_index_put, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_live_outputs_multiple_graphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_manager_per_device, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mark_step, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_meta_tensor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_child_node, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_custom_module, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_custom_module_buffer, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_parent_node, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module_buffers, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_param_inputs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multinomial, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_insert_removal_caching, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_on_inp_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_on_inp_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_reinplaced, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_no_rerecord_with_mark_static_address, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_not_fallback_to_eager_if_have_not_recompiling_too_many_times, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_output_alias, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_peristed_output_livenes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_remove_hooks_on_cached_tensors, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rerecord_if_static_input_address_changed, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rng_non_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rng_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_run_simple, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_separate_recordings, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_side_stream_memory_allocation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_single_stream_use, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_cpp_wrapper, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_cudagraph_unsafe_ops, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_symbolic, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_sparsity, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_static_inputs_address_mutation_log, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_storage_access_error, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_constant_mutation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_dies_between_checkpoint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_no_longer_in_pool, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_no_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_non_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_parameter, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unstable_ptr, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warmup_stream_sync, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warn_on_pending_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warn_once_if_dynamic_shape_limit_reached, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_workspace_allocation_error, test/inductor/test_cudagraph_trees.py::TestSAC::test_cpu_and_cuda_rng, test/inductor/test_cudagraph_trees.py::TestSAC::test_cudagraph_uneven_forward_backward, test/inductor/test_cudagraph_trees.py::TestSAC::test_cudagraphs_aot_eager_compat_equal, test/inductor/test_cudagraph_trees.py::TestSAC::test_cudagraphs_aot_eager_compat_equal_device_one, test/inductor/test_cudagraph_trees.py::TestSAC::test_graph_partition_cudagraphs_aot_eager_compat_equal, test/inductor/test_cudagraph_trees.py::TestSAC::test_multi_device, test/inductor/test_cudagraph_trees.py::TestSAC::test_retain_graph, test/inductor/test_cudagraph_trees.py::TestSAC::test_simple, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order0, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order1, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order2, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order3, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order4, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order5 2025-09-07T07:27:08.4100383Z 2025-09-07T07:27:08.4100580Z Running test_fx_experimental 1/1 ... [2025-09-07 07:27:08.399873] 2025-09-07T07:27:08.4100955Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:08.4101873Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx_experimental.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:08.400238] 2025-09-07T07:27:09.2321629Z 2025-09-07T07:27:09.2322920Z inductor/test_compile_worker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_worker_1.1_d20679b64637dc41_.log 2025-09-07T07:27:09.2326240Z Running 5 items in this shard: test/inductor/test_compile_worker.py::TestCompileWorker::test_basic_jobs, test/inductor/test_compile_worker.py::TestCompileWorker::test_crash, test/inductor/test_compile_worker.py::TestCompileWorker::test_exception, test/inductor/test_compile_worker.py::TestCompileWorker::test_logging, test/inductor/test_compile_worker.py::TestCompileWorker::test_quiesce 2025-09-07T07:27:09.2328490Z 2025-09-07T07:27:09.2328850Z Running inductor/test_triton_wrapper 1/1 ... [2025-09-07 07:27:09.232311] 2025-09-07T07:27:09.2329514Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:09.2331099Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_wrapper.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:09.232631] 2025-09-07T07:27:11.0658360Z 2025-09-07T07:27:11.0659823Z test_appending_byte_serializer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_appending_byte_serializer_1.1_0f2de03ac402bff3_.log 2025-09-07T07:27:11.0662635Z Running 3 items in this shard: test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_checksum, test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_write_and_read_class, test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_write_and_read_int 2025-09-07T07:27:11.0664434Z 2025-09-07T07:27:11.0664901Z Running inductor/test_torchinductor_strided_blocks 1/1 ... [2025-09-07 07:27:11.065879] 2025-09-07T07:27:11.0665667Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:11.0667670Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_strided_blocks.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:11.066211] 2025-09-07T07:27:11.8361377Z 2025-09-07T07:27:11.8362340Z dynamo/test_modules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_modules_1.1_52e228b5310dc15f_.log 2025-09-07T07:27:11.8399317Z Running 134 items in this shard: test/dynamo/test_modules.py::NNModuleTests::test_access_by_keys, test/dynamo/test_modules.py::NNModuleTests::test_basicmodule1, test/dynamo/test_modules.py::NNModuleTests::test_basicmodule2, test/dynamo/test_modules.py::NNModuleTests::test_call_fn_with_non_const_inputs_safe, test/dynamo/test_modules.py::NNModuleTests::test_cfgmod, test/dynamo/test_modules.py::NNModuleTests::test_children, test/dynamo/test_modules.py::NNModuleTests::test_constloop, test/dynamo/test_modules.py::NNModuleTests::test_conv_call_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_conv_call_super_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_conv_transpose_call_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_conv_transpose_call_super_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_densenet, test/dynamo/test_modules.py::NNModuleTests::test_enumvalues, test/dynamo/test_modules.py::NNModuleTests::test_fnmember, test/dynamo/test_modules.py::NNModuleTests::test_fnmembercmp1, test/dynamo/test_modules.py::NNModuleTests::test_fnmembercmp2, test/dynamo/test_modules.py::NNModuleTests::test_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_generation_tag, test/dynamo/test_modules.py::NNModuleTests::test_hasattr, test/dynamo/test_modules.py::NNModuleTests::test_inject_module_parameters, test/dynamo/test_modules.py::NNModuleTests::test_intarg, test/dynamo/test_modules.py::NNModuleTests::test_iseval1, test/dynamo/test_modules.py::NNModuleTests::test_iseval2, test/dynamo/test_modules.py::NNModuleTests::test_isnonelayer, test/dynamo/test_modules.py::NNModuleTests::test_istraining1, test/dynamo/test_modules.py::NNModuleTests::test_istraining2, test/dynamo/test_modules.py::NNModuleTests::test_layerlist, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module1, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module2, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module4, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module5, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module6, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module7, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_bad_params, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_bad_params_call_function, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_kwargs, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_no_cls_to_become, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_speculation_log_divergence, test/dynamo/test_modules.py::NNModuleTests::test_module_attribute_precedence, test/dynamo/test_modules.py::NNModuleTests::test_module_call_module_with_static_forward, test/dynamo/test_modules.py::NNModuleTests::test_module_class_method, test/dynamo/test_modules.py::NNModuleTests::test_module_comparison, test/dynamo/test_modules.py::NNModuleTests::test_module_forward_has_graph_break, test/dynamo/test_modules.py::NNModuleTests::test_module_guard_name_is_valid, test/dynamo/test_modules.py::NNModuleTests::test_module_name_string, test/dynamo/test_modules.py::NNModuleTests::test_module_property, test/dynamo/test_modules.py::NNModuleTests::test_module_static_method, test/dynamo/test_modules.py::NNModuleTests::test_moduledict, test/dynamo/test_modules.py::NNModuleTests::test_moduledict_custom, test/dynamo/test_modules.py::NNModuleTests::test_modulelist, test/dynamo/test_modules.py::NNModuleTests::test_modulelist_custom, test/dynamo/test_modules.py::NNModuleTests::test_modulelist_nested, test/dynamo/test_modules.py::NNModuleTests::test_modulemethod1, test/dynamo/test_modules.py::NNModuleTests::test_modulemethod2, test/dynamo/test_modules.py::NNModuleTests::test_named_children, test/dynamo/test_modules.py::NNModuleTests::test_nn_module_setattr, test/dynamo/test_modules.py::NNModuleTests::test_nn_module_unspec_int_attr, test/dynamo/test_modules.py::NNModuleTests::test_nn_moduledict_contains, test/dynamo/test_modules.py::NNModuleTests::test_parameterdict, test/dynamo/test_modules.py::NNModuleTests::test_parameterdict_custom, test/dynamo/test_modules.py::NNModuleTests::test_parameters1, test/dynamo/test_modules.py::NNModuleTests::test_parameters2, test/dynamo/test_modules.py::NNModuleTests::test_parameters3, test/dynamo/test_modules.py::NNModuleTests::test_parameters4, test/dynamo/test_modules.py::NNModuleTests::test_parameters5, test/dynamo/test_modules.py::NNModuleTests::test_self_mutating1, test/dynamo/test_modules.py::NNModuleTests::test_seq, test/dynamo/test_modules.py::NNModuleTests::test_sequential_with_duplicated_module, test/dynamo/test_modules.py::NNModuleTests::test_sequential_with_duplicated_module2, test/dynamo/test_modules.py::NNModuleTests::test_simple_torch_function, test/dynamo/test_modules.py::NNModuleTests::test_stringmember, test/dynamo/test_modules.py::NNModuleTests::test_submodules1, test/dynamo/test_modules.py::NNModuleTests::test_submodules2, test/dynamo/test_modules.py::NNModuleTests::test_super1, test/dynamo/test_modules.py::NNModuleTests::test_super2, test/dynamo/test_modules.py::NNModuleTests::test_super_class_method, test/dynamo/test_modules.py::NNModuleTests::test_tensorlist, test/dynamo/test_modules.py::NNModuleTests::test_torch_function_with_closure, test/dynamo/test_modules.py::NNModuleTests::test_torch_mangled_class_name, test/dynamo/test_modules.py::NNModuleTests::test_unsupportedmethod, test/dynamo/test_modules.py::NNModuleTests::test_unsupportedmodule, test/dynamo/test_modules.py::NNModuleTests::test_viamodulecall, test/dynamo/test_modules.py::OptimizedModuleTest::test_assign_does_not_exist, test/dynamo/test_modules.py::OptimizedModuleTest::test_attr, test/dynamo/test_modules.py::OptimizedModuleTest::test_attr_precedence, test/dynamo/test_modules.py::OptimizedModuleTest::test_backward_hooks, test/dynamo/test_modules.py::OptimizedModuleTest::test_branch_on_nn_module_custom_bool, test/dynamo/test_modules.py::OptimizedModuleTest::test_branch_on_nn_module_custom_len, test/dynamo/test_modules.py::OptimizedModuleTest::test_buffer_order, test/dynamo/test_modules.py::OptimizedModuleTest::test_composition, test/dynamo/test_modules.py::OptimizedModuleTest::test_composition_with_opt_mod, test/dynamo/test_modules.py::OptimizedModuleTest::test_delattr_on_compiled_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_dir, test/dynamo/test_modules.py::OptimizedModuleTest::test_dunder_call_explicitly, test/dynamo/test_modules.py::OptimizedModuleTest::test_globals_change_in_other_file, test/dynamo/test_modules.py::OptimizedModuleTest::test_guard_on_torch_nn_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_allowed_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_allowed_modules_compiles, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_allowed_modules_compiles_self_contained, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_inner, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_outer, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_skip_guards, test/dynamo/test_modules.py::OptimizedModuleTest::test_inline_inbuilt_nn_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_mark_static_nn_module_tensor, test/dynamo/test_modules.py::OptimizedModuleTest::test_mark_static_previously_seen_tensor, test/dynamo/test_modules.py::OptimizedModuleTest::test_mark_static_with_freezing, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_dict_iter_keys, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_dict_iter_name, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_dict_iter_values, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_order, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_patch, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_setattr, test/dynamo/test_modules.py::OptimizedModuleTest::test_monkeypatching_forward, test/dynamo/test_modules.py::OptimizedModuleTest::test_nn_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_no_op_assignment, test/dynamo/test_modules.py::OptimizedModuleTest::test_no_recompile_on_nn_guarded_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_overridden_call, test/dynamo/test_modules.py::OptimizedModuleTest::test_param_order, test/dynamo/test_modules.py::OptimizedModuleTest::test_param_requires_grad, test/dynamo/test_modules.py::OptimizedModuleTest::test_patch_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_recompile_limit_on_freed_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_recompile_limit_on_guarded_nn_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_recursion, test/dynamo/test_modules.py::OptimizedModuleTest::test_save_and_load_all_backends, test/dynamo/test_modules.py::OptimizedModuleTest::test_save_and_load_inductor, test/dynamo/test_modules.py::OptimizedModuleTest::test_setattr_on_compiled_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_to, test/dynamo/test_modules.py::OptimizedModuleTest::test_trace_delattr, test/dynamo/test_modules.py::OptimizedModuleTest::test_udo_instance_method_as_hook, test/dynamo/test_modules.py::OptimizedModuleTest::test_unhashable_nn_submodule, test/dynamo/test_modules.py::OptimizedModuleTest::test_unspec_non_inlinable_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_unspecialized_seq, test/dynamo/test_modules.py::OptimizedModuleTest::test_user_defined_nn_module_dynamic, test/dynamo/test_modules.py::NNModuleTestsDeviceCUDA::test_lazy_module3_cuda 2025-09-07T07:27:11.8430392Z 2025-09-07T07:27:11.8430561Z Running test_file_check 1/1 ... [2025-09-07 07:27:11.836431] 2025-09-07T07:27:11.8430907Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:11.8431906Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_file_check.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:11.836778] 2025-09-07T07:27:15.1751139Z 2025-09-07T07:27:15.1752212Z test_fx_experimental 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_experimental_1.1_6b3296092051ee41_.log 2025-09-07T07:27:15.2042849Z Running 724 items in this shard: test/test_fx_experimental.py::TestFXExperimental::test_annotate_getitem_node, test/test_fx_experimental.py::TestFXExperimental::test_annotate_returns_with_schema, test/test_fx_experimental.py::TestFXExperimental::test_aot_based_partition, test/test_fx_experimental.py::TestFXExperimental::test_call_to_assert_no_msg, test/test_fx_experimental.py::TestFXExperimental::test_call_to_assert_with_empty_msg, test/test_fx_experimental.py::TestFXExperimental::test_call_to_assert_with_msg, test/test_fx_experimental.py::TestFXExperimental::test_call_to_assert_with_multiline_message, test/test_fx_experimental.py::TestFXExperimental::test_conv_bn_fusion, test/test_fx_experimental.py::TestFXExperimental::test_conv_bn_fusion_mixed_dtype, test/test_fx_experimental.py::TestFXExperimental::test_conv_bn_fusion_not_running_state, test/test_fx_experimental.py::TestFXExperimental::test_cost_aware_partition, test/test_fx_experimental.py::TestFXExperimental::test_fetch, test/test_fx_experimental.py::TestFXExperimental::test_find_single_partition, test/test_fx_experimental.py::TestFXExperimental::test_lack_of_devices, test/test_fx_experimental.py::TestFXExperimental::test_large_node_error, test/test_fx_experimental.py::TestFXExperimental::test_merge_matmuls, test/test_fx_experimental.py::TestFXExperimental::test_meta_tracer, test/test_fx_experimental.py::TestFXExperimental::test_normalize_args, test/test_fx_experimental.py::TestFXExperimental::test_normalize_args_perserve_type, test/test_fx_experimental.py::TestFXExperimental::test_normalize_args_preserve_meta, test/test_fx_experimental.py::TestFXExperimental::test_normalize_binary_operators, test/test_fx_experimental.py::TestFXExperimental::test_normalize_modules_exhaustive, test/test_fx_experimental.py::TestFXExperimental::test_optimize_for_inference_cpu, test/test_fx_experimental.py::TestFXExperimental::test_optimize_for_inference_cpu_torchvision, test/test_fx_experimental.py::TestFXExperimental::test_partition_device_mapping, test/test_fx_experimental.py::TestFXExperimental::test_partition_latency, test/test_fx_experimental.py::TestFXExperimental::test_partition_node_manipulation, test/test_fx_experimental.py::TestFXExperimental::test_replace_target_nodes_with, test/test_fx_experimental.py::TestFXExperimental::test_saturate_host, test/test_fx_experimental.py::TestFXExperimental::test_size_based_partition, test/test_fx_experimental.py::TestFXExperimental::test_sparse_nn_partition, test/test_fx_experimental.py::TestFXExperimental::test_split_module_dead_code, test/test_fx_experimental.py::TestFXExperimental::test_split_module_default_arg, test/test_fx_experimental.py::TestFXExperimental::test_split_module_input_names, test/test_fx_experimental.py::TestFXExperimental::test_split_module_keep_original_order_and_noop_graph, test/test_fx_experimental.py::TestFXExperimental::test_split_module_kwargs_expansion, test/test_fx_experimental.py::TestFXExperimental::test_split_module_return_node, test/test_fx_experimental.py::TestFXExperimental::test_split_module_symint_dependency_handling, test/test_fx_experimental.py::TestFXExperimental::test_split_qualname_mapping, test/test_fx_experimental.py::TestFXExperimental::test_subgraph_creation, test/test_fx_experimental.py::TestFXExperimental::test_subgraph_trivial_resnet, test/test_fx_experimental.py::TestFXExperimental::test_subgraph_uniquename, test/test_fx_experimental.py::TestFXExperimental::test_to_folder, test/test_fx_experimental.py::TestFXExperimental::test_traceable_function_with_nonstandard_name, test/test_fx_experimental.py::TestFXExperimental::test_type_matches, test/test_fx_experimental.py::TestTranslationValidation::test_sat, test/test_fx_experimental.py::TestTranslationValidation::test_sat_bitwise, test/test_fx_experimental.py::TestTranslationValidation::test_sympy_to_z3, test/test_fx_experimental.py::TestTranslationValidation::test_unsat, test/test_fx_experimental.py::TestTranslationValidation::test_z3str, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_args_op_overload_cuda, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_H_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_T_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___getitem___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___radd___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rdiv___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rmatmul___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rmod___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rmul___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rpow___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rsub___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__batch_norm_with_update_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__chunk_cat_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__native_batch_norm_legit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__segment_reduce_lengths_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__segment_reduce_offsets_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__softmax_backward_data_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__unsafe_masked_index_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__upsample_bilinear2d_aa_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_abs_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_acos_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_acosh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_add_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addbmm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addcdiv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addcmul_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addmm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addmm_decomposed_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addmv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_alias_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_all_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_allclose_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_amax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_amin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_aminmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_angle_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_any_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_arange_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_argmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_argmin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_argsort_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_argwhere_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_as_strided_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_as_strided_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_as_strided_partial_views_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_as_strided_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_asin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_asinh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atan2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atan_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atanh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atleast_1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atleast_2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atleast_3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_baddbmm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_bernoulli_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_bfloat16_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_block_diag_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_bmm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_bool_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_broadcast_shapes_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_broadcast_tensors_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_broadcast_to_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_bucketize_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_byte_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cartesian_prod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cat_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cauchy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cdist_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cdouble_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ceil_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cfloat_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_chalf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_char_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cholesky_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cholesky_inverse_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cholesky_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_chunk_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_clamp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_clamp_max_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_clamp_min_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_clone_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_column_stack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_combinations_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_complex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_conj_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_conj_physical_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_constant_pad_nd_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_contiguous_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_copysign_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_corrcoef_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cos_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cosh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_count_nonzero_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cov_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cross_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cummax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cummin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cumprod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cumsum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cumulative_trapezoid_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_deg2rad_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diag_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diag_embed_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diagflat_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diagonal_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diagonal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diagonal_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diff_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_digamma_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_dist_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_div_floor_rounding_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_div_no_rounding_mode_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_div_trunc_rounding_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_dot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_double_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_dsplit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_dstack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_einsum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_empty_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_empty_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_empty_permuted_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_empty_strided_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_eq_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_equal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_erf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_erfc_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_erfinv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_exp2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_exp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_expand_as_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_expand_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_expand_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_expm1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_exponential_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_eye_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_fft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_fft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_fftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_fftshift_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_hfft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_hfft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_hfftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ifft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ifft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ifftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ifftshift_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ihfft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ihfft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ihfftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_irfft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_irfft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_irfftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_rfft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_rfft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_rfftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fill_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_flatten_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_flip_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fliplr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_flipud_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_float_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_float_power_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_floor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_floor_divide_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fmin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fmod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_frac_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_frexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_full_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_full_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_gather_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ge_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_geometric_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_geqrf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_gradient_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_grid_sampler_2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_grid_sampler_3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_gt_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_half_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_hash_tensor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_heaviside_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_histc_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_hsplit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_hstack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_hypot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_i0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_igamma_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_igammac_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_add_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_fill_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_put_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_reduce_amax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_reduce_amin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_reduce_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_reduce_prod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_select_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_inner_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_int_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isclose_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isfinite_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isinf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isnan_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isneginf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isposinf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isreal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_item_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_jiterator_2inputs_2outputs_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_jiterator_4inputs_with_extra_args_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_jiterator_binary_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_jiterator_binary_return_by_ref_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_jiterator_unary_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_kron_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_kthvalue_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ldexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_le_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lerp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lgamma_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_cholesky_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_cholesky_ex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_cond_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_cross_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_det_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_diagonal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_eig_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_eigh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_eigvals_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_eigvalsh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_householder_product_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_inv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_inv_ex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_ldl_factor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_ldl_factor_ex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_ldl_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lstsq_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lstsq_grad_oriented_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lu_factor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lu_factor_ex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lu_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_matrix_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_matrix_power_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_matrix_rank_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_matrix_rank_hermitian_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_multi_dot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_pinv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_pinv_hermitian_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_pinv_singular_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_qr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_slogdet_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_solve_ex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_solve_triangular_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_svd_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_svdvals_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_tensorinv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_tensorsolve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_vander_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_vecdot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_vector_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linspace_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linspace_tensor_overload_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log10_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log1p_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log_normal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log_softmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log_softmax_with_dtype_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logaddexp2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logaddexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logcumsumexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logdet_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logical_and_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logical_not_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logical_or_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logical_xor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logspace_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logspace_tensor_overload_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logsumexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_long_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lt_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lu_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lu_unpack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mH_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mT_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_amax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_amin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_argmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_argmin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_cumprod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_cumsum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_fill_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_log_softmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_logaddexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_logsumexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_median_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_normalize_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_prod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_select_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_softmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_softmin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_std_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_sum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_var_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_matmul_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_matrix_exp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_max_binary_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_max_pool2d_with_indices_backward_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_max_reduction_no_dim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_max_reduction_with_dim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_maximum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_median_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_meshgrid_list_of_tensors_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_meshgrid_variadic_tensors_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_min_binary_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_min_reduction_no_dim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_min_reduction_with_dim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_minimum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mode_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_movedim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_msort_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mul_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_multinomial_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nan_to_num_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nanmean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nanmedian_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nanquantile_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nansum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_narrow_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_narrow_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_native_batch_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_native_dropout_backward_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_native_layer_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ne_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_neg_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_new_empty_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_new_empty_strided_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_new_full_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_new_ones_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_new_zeros_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nextafter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_alpha_dropout_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_avg_pool1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_avg_pool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_avg_pool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_batch_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_bilinear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_binary_cross_entropy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_celu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_channel_shuffle_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv_transpose1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv_transpose2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv_transpose3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_cosine_similarity_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_cross_entropy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_ctc_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_dropout2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_dropout3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_dropout_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_elu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_embedding_bag_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_embedding_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_fractional_max_pool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_gaussian_nll_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_gelu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_glu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_grid_sample_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_group_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_hardshrink_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_hardsigmoid_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_hardswish_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_hardtanh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_hinge_embedding_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_huber_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_instance_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_area_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_bicubic_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_bilinear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_linear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_nearest_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_trilinear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_kl_div_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_l1_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_layer_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_leaky_relu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_linear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_local_response_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_logsigmoid_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_margin_ranking_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_pool1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_pool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_pool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool1d_grad_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool2d_grad_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_mish_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_mse_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_multi_head_attention_forward_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_multi_margin_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_multilabel_margin_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_nll_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_normalize_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pad_circular_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pad_constant_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pad_reflect_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pad_replicate_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pad_replicate_negative_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pairwise_distance_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pdist_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pixel_shuffle_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pixel_unshuffle_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_poisson_nll_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_prelu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_relu6_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_relu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_rms_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_rrelu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_selu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_silu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_smooth_l1_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_soft_margin_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_softmin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_softmin_with_dtype_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_softplus_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_softshrink_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_softsign_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_tanhshrink_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_threshold_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_triplet_margin_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_unfold_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_upsample_bilinear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_upsample_nearest_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nonzero_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nonzero_static_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_norm_fro_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_norm_inf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_norm_nuc_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_normal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_normal_in_place_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_normal_number_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ones_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ones_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ormqr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_outer_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_pca_lowrank_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_permute_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_permute_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_pinverse_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polar_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polygamma_polygamma_n_0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polygamma_polygamma_n_1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polygamma_polygamma_n_2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polygamma_polygamma_n_3_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polygamma_polygamma_n_4_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_positive_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_pow_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_prod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_put_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_qr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_quantile_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_rad2deg_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_rand_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_randint_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_randint_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_randn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_randn_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ravel_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_real_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_reciprocal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_remainder_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_renorm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_repeat_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_repeat_interleave_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_reshape_as_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_reshape_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_resize__cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_resize_as__cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_resolve_conj_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_resolve_neg_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_roll_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_rot90_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_round_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_round_decimals_0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_round_decimals_3_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_round_decimals_neg_3_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_rsqrt_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_rsub_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scalar_tensor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_add_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_reduce_amax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_reduce_amin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_reduce_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_reduce_prod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_reduce_sum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_searchsorted_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_select_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_select_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sgn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_short_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sigmoid_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sign_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_bartlett_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_blackman_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_cosine_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_exponential_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_gaussian_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_general_cosine_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_general_hamming_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_hamming_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_hann_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_kaiser_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_nuttall_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signbit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sinc_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sinh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_slice_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_slice_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_softmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_softmax_with_dtype_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sort_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sparse_mm_reduce_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sparse_sampled_addmm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_airy_ai_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_bessel_j0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_bessel_j1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_bessel_y0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_bessel_y1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_chebyshev_polynomial_t_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_chebyshev_polynomial_u_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_chebyshev_polynomial_v_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_chebyshev_polynomial_w_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_entr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_erfcx_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_hermite_polynomial_h_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_hermite_polynomial_he_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_i0e_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_i1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_i1e_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_laguerre_polynomial_l_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_legendre_polynomial_p_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_log_ndtr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_modified_bessel_i0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_modified_bessel_i1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_modified_bessel_k0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_modified_bessel_k1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_ndtr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_ndtri_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_scaled_modified_bessel_k0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_scaled_modified_bessel_k1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_spherical_bessel_j0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_xlog1py_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_zeta_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_split_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_split_list_args_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_split_with_sizes_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_split_with_sizes_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sqrt_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_square_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_squeeze_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_squeeze_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_squeeze_multiple_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_stack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_std_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_std_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_std_mean_unbiased_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_std_unbiased_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_stft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sub_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sum_to_size_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_svd_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_svd_lowrank_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_t_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_t_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_take_along_dim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_take_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tan_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tanh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tensor_split_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tensordot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tile_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_to_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_to_sparse_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_topk_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_trace_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_transpose_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_transpose_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_trapezoid_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_trapz_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_triangular_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tril_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_triu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_true_divide_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_trunc_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unbind_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unbind_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unflatten_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unfold_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unfold_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_uniform_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unique_consecutive_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unique_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unsafe_chunk_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unsafe_split_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unsqueeze_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unsqueeze_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_var_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_var_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_var_mean_unbiased_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_var_unbiased_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_vdot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_view_as_complex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_view_as_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_view_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_view_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_vsplit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_vstack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_where_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_xlogy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_zero__cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_zeros_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_zeros_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_quantized_eb_cuda 2025-09-07T07:27:15.2347299Z 2025-09-07T07:27:15.2347590Z Running dynamo/test_interop 1/1 ... [2025-09-07 07:27:15.176344] 2025-09-07T07:27:15.2347976Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:15.2348920Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_interop.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:15.176767] 2025-09-07T07:27:15.4061928Z 2025-09-07T07:27:15.4062780Z test_file_check 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_file_check_1.1_2c79218cc7a7b64b_.log 2025-09-07T07:27:15.4063980Z Running 2 items in this shard: test/test_file_check.py::TestFileCheck::test_all_python_api, test/test_file_check.py::TestFileCheck::test_not_run 2025-09-07T07:27:15.4064637Z 2025-09-07T07:27:15.4064905Z Running dynamo/test_metrics_context 1/1 ... [2025-09-07 07:27:15.406283] 2025-09-07T07:27:15.4065385Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:15.4067822Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_metrics_context.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:15.406589] 2025-09-07T07:27:16.0562221Z 2025-09-07T07:27:16.0563658Z inductor/test_triton_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_wrapper_1.1_67d4e1ccdfde8254_.log 2025-09-07T07:27:16.0565362Z Running 1 items in this shard: test/inductor/test_triton_wrapper.py::TestTritonWrapper::test_wrapper_using_gpu_seed 2025-09-07T07:27:16.0566091Z 2025-09-07T07:27:16.0566435Z Running test_functionalization 1/1 ... [2025-09-07 07:27:16.056299] 2025-09-07T07:27:16.0567066Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:16.0569064Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_functionalization.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:16.056620] 2025-09-07T07:27:16.7125596Z 2025-09-07T07:27:16.7126540Z export/test_export 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_export_1.1_67d0ade5d94187aa_.log 2025-09-07T07:27:16.7235755Z Running 445 items in this shard: test/export/test_export.py::TestDynamismExpression::test_export_assume_static_by_default, test/export/test_export.py::TestDynamismExpression::test_export_constraints_error, test/export/test_export.py::TestDynamismExpression::test_export_constraints_error_not_in_range, test/export/test_export.py::TestDynamismExpression::test_export_inline_constraints, test/export/test_export.py::TestDynamismExpression::test_export_slice_maxsize, test/export/test_export.py::TestDynamismExpression::test_export_slice_unbacked_dim1, test/export/test_export.py::TestDynamismExpression::test_export_strict_narrow_unbacked_expr, test/export/test_export.py::TestDynamismExpression::test_no_grad_param_inplace, test/export/test_export.py::TestDynamismExpression::test_reshape_view_backed_size_oblivious, test/export/test_export.py::TestExport::test__scaled_dot_product_flash_attention, test/export/test_export.py::TestExport::test_additional_inputs_constants, test/export/test_export.py::TestExport::test_allow_explicit_guards_as_runtime_asserts, test/export/test_export.py::TestExport::test_args_type_checked, test/export/test_export.py::TestExport::test_aten_lift_fresh_copy, test/export/test_export.py::TestExport::test_attention, test/export/test_export.py::TestExport::test_attr_assignment_extra, test/export/test_export.py::TestExport::test_automatic_constrain_size, test/export/test_export.py::TestExport::test_automatic_dynamic_shapes_constant_relation, test/export/test_export.py::TestExport::test_automatic_dynamic_shapes_linear_relation, test/export/test_export.py::TestExport::test_automatic_dynamic_shapes_simple_equality, test/export/test_export.py::TestExport::test_baddbmm, test/export/test_export.py::TestExport::test_basic, test/export/test_export.py::TestExport::test_basic_non_strict_fake_tensor, test/export/test_export.py::TestExport::test_basic_non_strict_real_tensor, test/export/test_export.py::TestExport::test_bincount, test/export/test_export.py::TestExport::test_buffer_util, test/export/test_export.py::TestExport::test_capture_subclass_constructor, test/export/test_export.py::TestExport::test_capture_subclass_constructor_torch_ir, test/export/test_export.py::TestExport::test_capture_subclass_wrong, test/export/test_export.py::TestExport::test_ccode_python_mod, test/export/test_export.py::TestExport::test_check_specialized_int, test/export/test_export.py::TestExport::test_checks_to_constrain_range, test/export/test_export.py::TestExport::test_cleanup_dynamic_markers, test/export/test_export.py::TestExport::test_colin_unbacked_backed_vr_sub, test/export/test_export.py::TestExport::test_colon_parameter, test/export/test_export.py::TestExport::test_compiling_state, test/export/test_export.py::TestExport::test_cond_access_identical_symint_closure, test/export/test_export.py::TestExport::test_cond_branches_return_constant_int, test/export/test_export.py::TestExport::test_cond_branches_return_same_int, test/export/test_export.py::TestExport::test_cond_buffers, test/export/test_export.py::TestExport::test_cond_contains_unbacked_no_escape, test/export/test_export.py::TestExport::test_cond_int_closure, test/export/test_export.py::TestExport::test_cond_unflatten, test/export/test_export.py::TestExport::test_cond_with_module_stack_export_with, test/export/test_export.py::TestExport::test_cond_with_module_stack_export_with_unflatten, test/export/test_export.py::TestExport::test_constant_aliasing, test/export/test_export.py::TestExport::test_constant_input_naming, test/export/test_export.py::TestExport::test_constant_no_user_inp, test/export/test_export.py::TestExport::test_constant_output, test/export/test_export.py::TestExport::test_constant_output_dup, test/export/test_export.py::TestExport::test_constant_requires_grad_const, test/export/test_export.py::TestExport::test_constant_return, test/export/test_export.py::TestExport::test_constant_tensor_mutation, test/export/test_export.py::TestExport::test_constant_tensor_with_non_functional, test/export/test_export.py::TestExport::test_constant_tensor_with_non_functional_nested, test/export/test_export.py::TestExport::test_constrain_decomp, test/export/test_export.py::TestExport::test_constrain_size_in_eager, test/export/test_export.py::TestExport::test_constrain_size_with_constrain_value, test/export/test_export.py::TestExport::test_constrain_size_with_various_cases, test/export/test_export.py::TestExport::test_conv_dynamic, test/export/test_export.py::TestExport::test_crop_like, test/export/test_export.py::TestExport::test_cse_for_symint, test/export/test_export.py::TestExport::test_custom_op_auto_functionalize, test/export/test_export.py::TestExport::test_custom_op_auto_functionalize_pre_dispatch, test/export/test_export.py::TestExport::test_custom_op_auto_warn_pre_dispatch, test/export/test_export.py::TestExport::test_custom_op_preserve, test/export/test_export.py::TestExport::test_custom_pytree, test/export/test_export.py::TestExport::test_custom_tag_metadata_re_export, test/export/test_export.py::TestExport::test_decomp_batch_norm_functional_predispatch, test/export/test_export.py::TestExport::test_decomp_item_in_prim_after_decomposition, test/export/test_export.py::TestExport::test_decomp_item_in_prim_before_decomposition, test/export/test_export.py::TestExport::test_default_decomposition_core_cia_ops, test/export/test_export.py::TestExport::test_derived_dim_1_2, test/export/test_export.py::TestExport::test_derived_dim_basic, test/export/test_export.py::TestExport::test_derived_dim_integer, test/export/test_export.py::TestExport::test_derived_dim_nested, test/export/test_export.py::TestExport::test_derived_dim_out_of_order, test/export/test_export.py::TestExport::test_derived_dim_out_of_order_repeat_derived, test/export/test_export.py::TestExport::test_derived_dim_out_of_order_simplified, test/export/test_export.py::TestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived, test/export/test_export.py::TestExport::test_derived_dim_repeat_derived, test/export/test_export.py::TestExport::test_detect_leak_nonstrict, test/export/test_export.py::TestExport::test_detect_leak_nonstrict_with_stacktrace, test/export/test_export.py::TestExport::test_detect_leak_strict, test/export/test_export.py::TestExport::test_device_to_dynamic, test/export/test_export.py::TestExport::test_device_to_gpu, test/export/test_export.py::TestExport::test_device_to_mutation, test/export/test_export.py::TestExport::test_device_to_mutation_float, test/export/test_export.py::TestExport::test_device_to_static, test/export/test_export.py::TestExport::test_dim_1_2, test/export/test_export.py::TestExport::test_dim_auto_and_dim, test/export/test_export.py::TestExport::test_dim_dynamic, test/export/test_export.py::TestExport::test_dim_dynamic_divisibility, test/export/test_export.py::TestExport::test_dim_dynamic_specialization, test/export/test_export.py::TestExport::test_dim_hint_range_violations, test/export/test_export.py::TestExport::test_dim_hint_ranges, test/export/test_export.py::TestExport::test_disable_forced_specializations_errors, test/export/test_export.py::TestExport::test_disable_forced_specializations_ok, test/export/test_export.py::TestExport::test_distributed_all_gather, test/export/test_export.py::TestExport::test_distributed_all_gather_into_tensor, test/export/test_export.py::TestExport::test_distributed_all_reduce, test/export/test_export.py::TestExport::test_distributed_all_to_all_single, test/export/test_export.py::TestExport::test_distributed_reduce_scatter_tensor, test/export/test_export.py::TestExport::test_dont_duck_size_for_auto_dynamic, test/export/test_export.py::TestExport::test_double_lifted_constants, test/export/test_export.py::TestExport::test_draft_export_checks_aliasing, test/export/test_export.py::TestExport::test_draft_export_checks_mutation, test/export/test_export.py::TestExport::test_draft_export_checks_mutation_list, test/export/test_export.py::TestExport::test_draft_export_checks_mutation_with_nan, test/export/test_export.py::TestExport::test_draft_export_fake_kernel_inference_errors, test/export/test_export.py::TestExport::test_draft_export_infers_fake_kernel, test/export/test_export.py::TestExport::test_duplicate_modules_with_non_persistent_buffers, test/export/test_export.py::TestExport::test_dynamic_lr_shift, test/export/test_export.py::TestExport::test_dynamic_shapes_bounds, test/export/test_export.py::TestExport::test_dynamic_shapes_builder_basic, test/export/test_export.py::TestExport::test_dynamic_shapes_builder_kwargs, test/export/test_export.py::TestExport::test_dynamic_shapes_builder_pytree, test/export/test_export.py::TestExport::test_dynamic_shapes_dataclass, test/export/test_export.py::TestExport::test_dynamic_shapes_inferred_basic, test/export/test_export.py::TestExport::test_dynamic_shapes_serdes_generic, test/export/test_export.py::TestExport::test_dynamic_shapes_serdes_user_errors, test/export/test_export.py::TestExport::test_dynamic_shapes_serdes_various, test/export/test_export.py::TestExport::test_dynamic_shapes_spec_with_pytree, test/export/test_export.py::TestExport::test_dynamic_sym_round, test/export/test_export.py::TestExport::test_ends_of_bounds_oblivious, test/export/test_export.py::TestExport::test_error_does_not_reference_eager_fallback, test/export/test_export.py::TestExport::test_error_when_passing_mutating_primitive_op, test/export/test_export.py::TestExport::test_exception, test/export/test_export.py::TestExport::test_expand_copy_export_handles_implicit_true, test/export/test_export.py::TestExport::test_export_api_with_dynamic_shapes, test/export/test_export.py::TestExport::test_export_as_backend, test/export/test_export.py::TestExport::test_export_associative_scan_lifted_buffers, test/export/test_export.py::TestExport::test_export_associative_scan_symbol_dim, test/export/test_export.py::TestExport::test_export_associative_scan_symbol_scandim, test/export/test_export.py::TestExport::test_export_aten_to_unflatten, test/export/test_export.py::TestExport::test_export_aten_to_unflatten_subclass, test/export/test_export.py::TestExport::test_export_aten_to_unflatten_subclass_pre_dispatch, test/export/test_export.py::TestExport::test_export_cond_preserve_torch_fn_for_subgraphs, test/export/test_export.py::TestExport::test_export_cond_symbool_pred, test/export/test_export.py::TestExport::test_export_cond_warns_constant_pred, test/export/test_export.py::TestExport::test_export_custom_decomp_table_basic_pop, test/export/test_export.py::TestExport::test_export_custom_decomp_table_container_methods, test/export/test_export.py::TestExport::test_export_custom_op_lib, test/export/test_export.py::TestExport::test_export_custom_triton_kernel, test/export/test_export.py::TestExport::test_export_custom_triton_kernel_mutable, test/export/test_export.py::TestExport::test_export_cyclic_reference_leak, test/export/test_export.py::TestExport::test_export_decomp_torture_case_1, test/export/test_export.py::TestExport::test_export_decomp_torture_case_2, test/export/test_export.py::TestExport::test_export_decomps_dynamic, test/export/test_export.py::TestExport::test_export_decomps_simple, test/export/test_export.py::TestExport::test_export_dynamo_config, test/export/test_export.py::TestExport::test_export_for_training_run_decomp, test/export/test_export.py::TestExport::test_export_for_training_with_container_type, test/export/test_export.py::TestExport::test_export_for_training_with_dynamic_shapes, test/export/test_export.py::TestExport::test_export_for_training_with_mutation, test/export/test_export.py::TestExport::test_export_for_training_with_state_dict_hooks, test/export/test_export.py::TestExport::test_export_func_with_default_kwargs, test/export/test_export.py::TestExport::test_export_func_with_keyword_only_args, test/export/test_export.py::TestExport::test_export_func_with_kwargs, test/export/test_export.py::TestExport::test_export_func_with_pytree_kwargs, test/export/test_export.py::TestExport::test_export_func_with_var_keyword_args, test/export/test_export.py::TestExport::test_export_func_with_var_keyword_pytree_args, test/export/test_export.py::TestExport::test_export_func_with_var_postional_args, test/export/test_export.py::TestExport::test_export_function_schema, test/export/test_export.py::TestExport::test_export_graph_with_no_inputs, test/export/test_export.py::TestExport::test_export_input_mutation_bug, test/export/test_export.py::TestExport::test_export_input_mutation_dynamic_shape, test/export/test_export.py::TestExport::test_export_input_mutation_static_shape, test/export/test_export.py::TestExport::test_export_linear_preserve_dynamic_shape, test/export/test_export.py::TestExport::test_export_max_nonstrict, test/export/test_export.py::TestExport::test_export_max_onnx_reported, test/export/test_export.py::TestExport::test_export_method, test/export/test_export.py::TestExport::test_export_mod_constraints, test/export/test_export.py::TestExport::test_export_module, test/export/test_export.py::TestExport::test_export_preserve_linear_at_aot_level, test/export/test_export.py::TestExport::test_export_preserve_linear_but_not_custom_op, test/export/test_export.py::TestExport::test_export_scan_pytree_output, test/export/test_export.py::TestExport::test_export_script_module, test/export/test_export.py::TestExport::test_export_statically_known_true, test/export/test_export.py::TestExport::test_export_then_compile_tensor_ctor, test/export/test_export.py::TestExport::test_export_with_autocast, test/export/test_export.py::TestExport::test_export_with_fake_tensor_inputs, test/export/test_export.py::TestExport::test_export_with_fake_tensor_inputs_on_cuda_devices, test/export/test_export.py::TestExport::test_export_with_inline_constraints, test/export/test_export.py::TestExport::test_export_with_inline_constraints_complex, test/export/test_export.py::TestExport::test_export_with_set_grad_enabled, test/export/test_export.py::TestExport::test_export_with_wrong_inputs, test/export/test_export.py::TestExport::test_external_call_non_strict_real_tensor, test/export/test_export.py::TestExport::test_fake_inputs, test/export/test_export.py::TestExport::test_fake_weights, test/export/test_export.py::TestExport::test_filter_traceback_frames, test/export/test_export.py::TestExport::test_float_conversion, test/export/test_export.py::TestExport::test_float_conversion_from_int, test/export/test_export.py::TestExport::test_fqn, test/export/test_export.py::TestExport::test_from_node_metadata_export, test/export/test_export.py::TestExport::test_full_on_scalar_tensor, test/export/test_export.py::TestExport::test_function_holding_tensor, test/export/test_export.py::TestExport::test_hints_wrapper, test/export/test_export.py::TestExport::test_hoo_inline_users_issue, test/export/test_export.py::TestExport::test_if_functional, test/export/test_export.py::TestExport::test_if_post_autograd_op_preserved, test/export/test_export.py::TestExport::test_inline_script_class_method, test/export/test_export.py::TestExport::test_inline_script_class_method_recursive, test/export/test_export.py::TestExport::test_inline_script_function, test/export/test_export.py::TestExport::test_inline_script_method, test/export/test_export.py::TestExport::test_int_shape_specialization, test/export/test_export.py::TestExport::test_intermediate_shape_comp, test/export/test_export.py::TestExport::test_is_exporting, test/export/test_export.py::TestExport::test_is_non_negative_check_function, test/export/test_export.py::TestExport::test_is_nonzero, test/export/test_export.py::TestExport::test_isnonzero, test/export/test_export.py::TestExport::test_issue_113041, test/export/test_export.py::TestExport::test_issue_157289, test/export/test_export.py::TestExport::test_istft_op, test/export/test_export.py::TestExport::test_keep_composite_ops_invalid, test/export/test_export.py::TestExport::test_keep_composite_ops_linear_convd, test/export/test_export.py::TestExport::test_keep_composite_ops_linear_convd_for_training_ir, test/export/test_export.py::TestExport::test_kwarg_dynamic_shapes_diff_order, test/export/test_export.py::TestExport::test_kwargs_reorder, test/export/test_export.py::TestExport::test_layer_norm_unbacked_normalized_shape, test/export/test_export.py::TestExport::test_layer_sharing, test/export/test_export.py::TestExport::test_lazy_module_kwargs, test/export/test_export.py::TestExport::test_lifted_constants, test/export/test_export.py::TestExport::test_linear_conv, test/export/test_export.py::TestExport::test_malformed_fqn_from_source_name, test/export/test_export.py::TestExport::test_map, test/export/test_export.py::TestExport::test_map_buffers, test/export/test_export.py::TestExport::test_mask_nonzero_static, test/export/test_export.py::TestExport::test_masked_select_dynamic, test/export/test_export.py::TestExport::test_math_pow, test/export/test_export.py::TestExport::test_mismatched_dynamic_shapes, test/export/test_export.py::TestExport::test_mixed_input, test/export/test_export.py::TestExport::test_module, test/export/test_export.py::TestExport::test_module_dict_key, test/export/test_export.py::TestExport::test_module_input, test/export/test_export.py::TestExport::test_module_input_subclasses_parameterization_nested, test/export/test_export.py::TestExport::test_module_list_slice, test/export/test_export.py::TestExport::test_module_with_dict_container_inp_out, test/export/test_export.py::TestExport::test_modules_access_for_deleted_submodule, test/export/test_export.py::TestExport::test_more_multidimensional_slicing, test/export/test_export.py::TestExport::test_multidimensional_slicing, test/export/test_export.py::TestExport::test_multinomial_dynamic, test/export/test_export.py::TestExport::test_multiple_definitions_same_name_dim, test/export/test_export.py::TestExport::test_nested_dynamic_shapes_spec, test/export/test_export.py::TestExport::test_nested_module, test/export/test_export.py::TestExport::test_nested_module_with_constant_buffer, test/export/test_export.py::TestExport::test_nested_module_with_init_buffer, test/export/test_export.py::TestExport::test_nested_module_with_parameter, test/export/test_export.py::TestExport::test_nn_module_stack, test/export/test_export.py::TestExport::test_nn_module_stack_shared_submodule, test/export/test_export.py::TestExport::test_no_check_is_size_error, test/export/test_export.py::TestExport::test_no_suggested_fixes_for_data_dependent_errors, test/export/test_export.py::TestExport::test_no_tensor_computation, test/export/test_export.py::TestExport::test_no_tensor_computation_2, test/export/test_export.py::TestExport::test_no_tensor_computation_3, test/export/test_export.py::TestExport::test_no_tensor_computation_4, test/export/test_export.py::TestExport::test_non_arg_name_dynamic_shapes_api, test/export/test_export.py::TestExport::test_non_arg_name_dynamic_shapes_api_with_container_type, test/export/test_export.py::TestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg, test/export/test_export.py::TestExport::test_non_persistent_buffer, test/export/test_export.py::TestExport::test_non_strict_dynamic_shapes, test/export/test_export.py::TestExport::test_non_strict_dynamic_shapes_suggested_fixes, test/export/test_export.py::TestExport::test_none_buffers, test/export/test_export.py::TestExport::test_nonstrict_retrace_preserves_metadata, test/export/test_export.py::TestExport::test_nonzero_2, test/export/test_export.py::TestExport::test_nonzero_dynamic, test/export/test_export.py::TestExport::test_not_registered_parameter, test/export/test_export.py::TestExport::test_operator_aten_tensor_mode_variant, test/export/test_export.py::TestExport::test_output_node_name, test/export/test_export.py::TestExport::test_pad_sequence, test/export/test_export.py::TestExport::test_param_util, test/export/test_export.py::TestExport::test_partial_patched_forward, test/export/test_export.py::TestExport::test_placeholder_naming_collisions, test/export/test_export.py::TestExport::test_placeholder_naming_collisions_hoo_subgraphs, test/export/test_export.py::TestExport::test_placeholder_naming_order, test/export/test_export.py::TestExport::test_placeholder_naming_order_variadic, test/export/test_export.py::TestExport::test_placeholder_update_preserving, test/export/test_export.py::TestExport::test_predispatch_cond, test/export/test_export.py::TestExport::test_predispatch_grad_wrappers, test/export/test_export.py::TestExport::test_preserve_module_call_signature_unflatten_specialization, test/export/test_export.py::TestExport::test_preserve_requires_grad_placeholders, test/export/test_export.py::TestExport::test_preserve_shape_dynamism_for_unused_inputs, test/export/test_export.py::TestExport::test_profiling_code, test/export/test_export.py::TestExport::test_python_asserts_with_sym_int, test/export/test_export.py::TestExport::test_pytree_register_data_class, test/export/test_export.py::TestExport::test_pytree_register_nested_data_class, test/export/test_export.py::TestExport::test_raise_user_error_when_guard_on_data_dependent_operation, test/export/test_export.py::TestExport::test_range_constraints_with_replacement, test/export/test_export.py::TestExport::test_real_tensor_alias_dtype_mismatch, test/export/test_export.py::TestExport::test_real_tensor_bool_cast, test/export/test_export.py::TestExport::test_real_tensor_errors_on_aliasing_custom_op, test/export/test_export.py::TestExport::test_real_tensor_for_max_op, test/export/test_export.py::TestExport::test_real_tensor_size_mismatch, test/export/test_export.py::TestExport::test_redundant_assert_max_upper_bound, test/export/test_export.py::TestExport::test_redundant_asserts, test/export/test_export.py::TestExport::test_refine_dynamic_shapes_from_suggested_fixes, test/export/test_export.py::TestExport::test_register_constant, test/export/test_export.py::TestExport::test_repeat_interleave, test/export/test_export.py::TestExport::test_replace_unbacked_with_very_large_upperbound, test/export/test_export.py::TestExport::test_replaced_unbacked_bindings, test/export/test_export.py::TestExport::test_reshape_view_helper, test/export/test_export.py::TestExport::test_retracable_ep, test/export/test_export.py::TestExport::test_retrace_pre_autograd, test/export/test_export.py::TestExport::test_run_decomposition_supports_user_input_mutation, test/export/test_export.py::TestExport::test_run_decompositions_keep_metadata, test/export/test_export.py::TestExport::test_run_decompositions_keep_tensor_constant_metadata, test/export/test_export.py::TestExport::test_runtime_assert_for_prim, test/export/test_export.py::TestExport::test_runtime_assert_for_prm_str, test/export/test_export.py::TestExport::test_runtime_assert_with_size, test/export/test_export.py::TestExport::test_sdpa_gqa, test/export/test_export.py::TestExport::test_sequential_slicing, test/export/test_export.py::TestExport::test_set_example_inputs, test/export/test_export.py::TestExport::test_set_grad_as_side_effect, test/export/test_export.py::TestExport::test_set_grad_empty, test/export/test_export.py::TestExport::test_set_grad_unflatten, test/export/test_export.py::TestExport::test_setgrad_lifted_tensor, test/export/test_export.py::TestExport::test_shared_submodule_nn_module_stack, test/export/test_export.py::TestExport::test_simple_export_for_training, test/export/test_export.py::TestExport::test_simple_unbacked_view, test/export/test_export.py::TestExport::test_size_input, test/export/test_export.py::TestExport::test_slice_nn_module_stack, test/export/test_export.py::TestExport::test_solver_unsupported_sympy_function, test/export/test_export.py::TestExport::test_specialize_derived_dim_roots, test/export/test_export.py::TestExport::test_split_const_gm_with_lifted_constants, test/export/test_export.py::TestExport::test_stack_trace, test/export/test_export.py::TestExport::test_stack_trace_make_fx, test/export/test_export.py::TestExport::test_state_primitives, test/export/test_export.py::TestExport::test_state_shape_attribute_assignment, test/export/test_export.py::TestExport::test_state_tensors, test/export/test_export.py::TestExport::test_static_dim_constraints, test/export/test_export.py::TestExport::test_subclass_nested_attr_access, test/export/test_export.py::TestExport::test_subclass_nested_attr_access_complicated_metadata, test/export/test_export.py::TestExport::test_subclass_nested_attr_access_const_metadata, test/export/test_export.py::TestExport::test_subclass_nested_attr_access_const_metadata_not_top_level, test/export/test_export.py::TestExport::test_subclass_nested_attr_access_submodule, test/export/test_export.py::TestExport::test_subclasses_parameterization, test/export/test_export.py::TestExport::test_subclasses_parameterization_nested, test/export/test_export.py::TestExport::test_suggest_torch_checks_with_non_negative_check, test/export/test_export.py::TestExport::test_suggest_torch_checks_with_regular_check, test/export/test_export.py::TestExport::test_suggested_fixes_for_data_dependent_errors_basic, test/export/test_export.py::TestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers, test/export/test_export.py::TestExport::test_suggested_fixes_new_roots, test/export/test_export.py::TestExport::test_sym_float_operators, test/export/test_export.py::TestExport::test_sym_or_sym_and, test/export/test_export.py::TestExport::test_sym_sqrt, test/export/test_export.py::TestExport::test_symbool_item, test/export/test_export.py::TestExport::test_symfloat_item, test/export/test_export.py::TestExport::test_symint_input_additional_inputs, test/export/test_export.py::TestExport::test_symint_input_basic, test/export/test_export.py::TestExport::test_symint_input_ranges, test/export/test_export.py::TestExport::test_symint_input_shapes_collection, test/export/test_export.py::TestExport::test_symint_input_specialization, test/export/test_export.py::TestExport::test_symint_item, test/export/test_export.py::TestExport::test_symint_output, test/export/test_export.py::TestExport::test_symint_tensor_return, test/export/test_export.py::TestExport::test_tensor_attribute_zero_args, test/export/test_export.py::TestExport::test_tensor_constant_aten_to, test/export/test_export.py::TestExport::test_tensor_constant_with_wrapped_method, test/export/test_export.py::TestExport::test_to_module_with_mutated_buffer, test/export/test_export.py::TestExport::test_to_module_with_mutated_buffer_multiple, test/export/test_export.py::TestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later, test/export/test_export.py::TestExport::test_tolist, test/export/test_export.py::TestExport::test_torch_check_eq_commutativity, test/export/test_export.py::TestExport::test_torch_fn, test/export/test_export.py::TestExport::test_trace_under_fake, test/export/test_export.py::TestExport::test_train_eval_on_exported_preautograd_module, test/export/test_export.py::TestExport::test_unbacked_3d_matmul, test/export/test_export.py::TestExport::test_unbacked_bincount, test/export/test_export.py::TestExport::test_unbacked_bindings_for_divisible_u_symint, test/export/test_export.py::TestExport::test_unbacked_deferred_runtime_retrace, test/export/test_export.py::TestExport::test_unbacked_expand, test/export/test_export.py::TestExport::test_unbacked_infer_size, test/export/test_export.py::TestExport::test_unbacked_kth_value, test/export/test_export.py::TestExport::test_unbacked_linear_layer_norm_input, test/export/test_export.py::TestExport::test_unbacked_noncontig_lin, test/export/test_export.py::TestExport::test_unbacked_pad, test/export/test_export.py::TestExport::test_unbacked_scalar_constructor, test/export/test_export.py::TestExport::test_unbacked_slice, test/export/test_export.py::TestExport::test_unbacked_to_cond, test/export/test_export.py::TestExport::test_unbacked_to_cond_passthrough, test/export/test_export.py::TestExport::test_unbacked_unsqueeze, test/export/test_export.py::TestExport::test_unflatten_asserts, test/export/test_export.py::TestExport::test_unflatten_buffer_update_child2parent_swap, test/export/test_export.py::TestExport::test_unflatten_closure, test/export/test_export.py::TestExport::test_unflatten_isinstance, test/export/test_export.py::TestExport::test_unflatten_multiple_graphs_dispatch, test/export/test_export.py::TestExport::test_unflatten_multiple_graphs_preserve_signature_no_error, test/export/test_export.py::TestExport::test_unflatten_multiple_graphs_shared_submodule, test/export/test_export.py::TestExport::test_unflatten_multiple_graphs_state, test/export/test_export.py::TestExport::test_unflatten_no_unroll, test/export/test_export.py::TestExport::test_unflatten_placeholder_update_child2parent_swap, test/export/test_export.py::TestExport::test_unflatten_placeholder_update_grandchild2cousin_swap, test/export/test_export.py::TestExport::test_unflatten_random_dag_5, test/export/test_export.py::TestExport::test_unflatten_random_dag_6, test/export/test_export.py::TestExport::test_unflatten_random_dag_buf_8, test/export/test_export.py::TestExport::test_unflatten_random_dag_const_preserving_3, test/export/test_export.py::TestExport::test_unflatten_random_dag_const_preserving_3_1, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_4, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_6, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_9, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_preserving_10, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_preserving_4, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_preserving_5, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_preserving_7, test/export/test_export.py::TestExport::test_unflatten_random_dag_preserving_4, test/export/test_export.py::TestExport::test_unused_aliases, test/export/test_export.py::TestExport::test_unused_constant, test/export/test_export.py::TestExport::test_use_embedding_twice, test/export/test_export.py::TestExport::test_user_input_and_buffer_mutation, test/export/test_export.py::TestExport::test_vmap, test/export/test_export.py::TestExport::test_while_loop_assert_separation, test/export/test_export.py::TestExport::test_while_loop_index_assertions, test/export/test_export.py::TestExport::test_while_loop_simple, test/export/test_export.py::TestExport::test_while_loop_tensor_constant_idx, test/export/test_export.py::TestExport::test_wrapper_module, test/export/test_export.py::TestOneOffModelExportResult::test_assert_tensor_metadata_device_index, test/export/test_export.py::TestOneOffModelExportResult::test_constant_fqn, test/export/test_export.py::TestOneOffModelExportResult::test_constant_name, test/export/test_export.py::TestOneOffModelExportResult::test_duplicated_getitem, test/export/test_export.py::TestOneOffModelExportResult::test_hf_logging_logger, test/export/test_export.py::TestOneOffModelExportResult::test_input_output_no_stacktrace, test/export/test_export.py::TestOneOffModelExportResult::test_int_list_output, test/export/test_export.py::TestOneOffModelExportResult::test_logging_logger, test/export/test_export.py::TestOneOffModelExportResult::test_nested_retrace, test/export/test_export.py::TestOneOffModelExportResult::test_none_input_output, test/export/test_export.py::TestOneOffModelExportResult::test_primitive_constant_output, test/export/test_export.py::TestOneOffModelExportResult::test_print, test/export/test_export.py::TestOneOffModelExportResult::test_print_graph_signature, test/export/test_export.py::TestOneOffModelExportResult::test_scaled_dot_product_attention_cpu, test/export/test_export.py::TestOneOffModelExportResult::test_scaled_dot_product_attention_cuda, test/export/test_export.py::TestOneOffModelExportResult::test_strict_export_with_shared_parameters, test/export/test_export.py::TestOneOffModelExportResult::test_torchrec_jagged_tensor, test/export/test_export.py::TestOneOffModelExportResult::test_unbacked_sdpa, test/export/test_export.py::TestOneOffModelExportResult::test_warning, test/export/test_export.py::TestExportCustomClass::test_export_script_module, test/export/test_export.py::TestExportCustomClass::test_export_unbacked_lt, test/export/test_export.py::TestExportCustomClass::test_int_lift_constant, test/export/test_export.py::TestExportCustomClass::test_is_fx_tracing, test/export/test_export.py::TestExportCustomClass::test_item, test/export/test_export.py::TestExportCustomClass::test_lift_custom_obj, test/export/test_export.py::TestExportCustomClass::test_preserve_cia_op, test/export/test_export.py::TestExportCustomClass::test_preserve_non_cia_op, test/export/test_export.py::TestExportCustomClass::test_unbacked_contiguous, test/export/test_export.py::TestExportCustomClass::test_unbacked_select_index 2025-09-07T07:27:16.7338645Z 2025-09-07T07:27:16.7338859Z Running dynamo/test_inline_and_install 1/1 ... [2025-09-07 07:27:16.713174] 2025-09-07T07:27:16.7339263Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:16.7340224Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_inline_and_install.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:16.713550] 2025-09-07T07:27:18.7913298Z 2025-09-07T07:27:18.7914630Z inductor/test_torchinductor_strided_blocks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_strided_blocks_1.1_37e54bed41f3ea6b_.log 2025-09-07T07:27:18.8071495Z Running 297 items in this shard: test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_2d_reduction_multi_kernel_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_2d_reduction_odd_shapes_view_size0_num_block_pointers_1_num_triton_kernels_1_reduction_op0_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_2d_reduction_odd_shapes_view_size1_num_block_pointers_3_num_triton_kernels_2_reduction_op1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_2d_reduction_odd_shapes_view_size2_num_block_pointers_1_num_triton_kernels_1_reduction_op2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_2d_reduction_odd_shapes_view_size3_num_block_pointers_1_num_triton_kernels_1_reduction_op3_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_2d_reduction_odd_shapes_view_size4_num_block_pointers_1_num_triton_kernels_1_reduction_op4_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_2d_reductions_mixed_indexing_reduction_op0_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_2d_reductions_mixed_indexing_reduction_op1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_2d_welford_reduction_size0_expected_num_block_pointers_1_expected_num_triton_kernels_1_expect_fallback_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_2d_welford_reduction_size1_expected_num_block_pointers_9_expected_num_triton_kernels_2_expect_fallback_False_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_3d_permute_tiling_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_boundary_check_block_multiple_False_ynumel_exceed_ygrid_size_False_include_z_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_boundary_check_block_multiple_True_ynumel_exceed_ygrid_size_False_include_z_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_boundary_check_block_multiple_True_ynumel_exceed_ygrid_size_True_include_z_False_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_broadcast_prefer_nd_tiling_False_x_size0_y_size0_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_broadcast_prefer_nd_tiling_False_x_size1_y_size1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_broadcast_prefer_nd_tiling_False_x_size2_y_size2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_broadcast_prefer_nd_tiling_False_x_size3_y_size3_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_broadcast_prefer_nd_tiling_True_x_size0_y_size0_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_broadcast_prefer_nd_tiling_True_x_size1_y_size1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_broadcast_prefer_nd_tiling_True_x_size2_y_size2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_broadcast_prefer_nd_tiling_True_x_size3_y_size3_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_broadcast_with_singleton_dims_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_complex_reshape_block_ptr_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_dynamic_shapes_pointwise_multiple_max_block_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_dynamic_shapes_pointwise_nd_tiling_False_num_block_pointers_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_dynamic_shapes_pointwise_nd_tiling_True_num_block_pointers_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_dynamic_shapes_reduction_with_tiling_False_num_block_pointers_0_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_dynamic_shapes_reduction_with_tiling_True_num_block_pointers_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_enable_tiled_reductions_tile_reductions_False_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_enable_tiled_reductions_tile_reductions_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_ensure_integral_dims_and_strides_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expand_broadcast_x_size0_y_size0_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expand_broadcast_x_size1_y_size1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expand_broadcast_x_size2_y_size2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expand_broadcast_x_size3_y_size3_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expand_broadcast_x_size4_y_size4_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expand_broadcast_x_size5_y_size5_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expand_broadcast_x_size6_y_size6_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expand_broadcast_x_size7_y_size7_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expand_broadcast_x_size8_y_size8_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expand_broadcast_x_size9_y_size9_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expand_clone_broadcast_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expected_num_block_pointers_expected_num_block_pointers_3_raises_False_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_expected_num_block_pointers_expected_num_block_pointers_9_raises_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_fused_2d_reduction_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_mixed_pointwise_reduction_view_size0_num_block_pointers_2_num_triton_kernels_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_mixed_pointwise_reduction_view_size1_num_block_pointers1_num_triton_kernels1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_mul_broadcast_multi_output_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_multiple_max_block_non_power_of_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_nd_tiling_odd_shapes_pointwise_full_size0_view_size0_num_block_pointers_3_num_tiles_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_nd_tiling_odd_shapes_pointwise_full_size1_view_size1_num_block_pointers_3_num_tiles_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_nd_tiling_odd_shapes_pointwise_full_size2_view_size2_num_block_pointers_3_num_tiles_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_nd_tiling_odd_shapes_pointwise_full_size3_view_size3_num_block_pointers_3_num_tiles_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_nd_tiling_odd_shapes_pointwise_full_size4_view_size4_num_block_pointers_3_num_tiles_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_nd_tiling_odd_shapes_pointwise_full_size5_view_size5_num_block_pointers_1_num_tiles_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_negative_strides_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_broadcast_nonzero_strides_prefer_nd_tiling_False_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_broadcast_nonzero_strides_prefer_nd_tiling_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_index_order_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_False_full_size0_view_size0_stride0_offset0_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_False_full_size1_view_size1_stride1_offset1_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_False_full_size2_view_size2_stride2_offset2_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_False_full_size3_view_size3_stride3_offset_10_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_False_full_size4_view_size4_stride4_offset4_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_False_full_size5_view_size5_stride5_offset5_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_False_full_size6_view_size6_stride6_offset6_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_False_full_size7_view_size7_stride7_offset7_require_block_ptr_False_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_False_full_size8_view_size8_stride8_offset8_require_block_ptr_False_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_False_full_size9_view_size9_stride9_offset9_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_True_full_size0_view_size0_stride0_offset0_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_True_full_size1_view_size1_stride1_offset1_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_True_full_size2_view_size2_stride2_offset2_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_True_full_size3_view_size3_stride3_offset_10_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_True_full_size4_view_size4_stride4_offset4_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_True_full_size5_view_size5_stride5_offset5_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_True_full_size6_view_size6_stride6_offset6_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_True_full_size7_view_size7_stride7_offset7_require_block_ptr_False_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_True_full_size8_view_size8_stride8_offset8_require_block_ptr_False_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_pointwise_prefer_nd_tiling_True_full_size9_view_size9_stride9_offset9_require_block_ptr_True_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_multiple_discontiguous_dims_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_False_view_size0_num_block_pointers_1_num_triton_kernels_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_False_view_size1_num_block_pointers_1_num_triton_kernels_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_False_view_size2_num_block_pointers_1_num_triton_kernels_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_False_view_size3_num_block_pointers3_num_triton_kernels_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_False_view_size4_num_block_pointers_3_num_triton_kernels_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_False_view_size5_num_block_pointers_2_num_triton_kernels_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_False_view_size6_num_block_pointers_3_num_triton_kernels_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_True_view_size0_num_block_pointers_1_num_triton_kernels_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_True_view_size1_num_block_pointers_1_num_triton_kernels_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_True_view_size2_num_block_pointers_1_num_triton_kernels_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_True_view_size3_num_block_pointers3_num_triton_kernels_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_True_view_size4_num_block_pointers_3_num_triton_kernels_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_True_view_size5_num_block_pointers_2_num_triton_kernels_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_reduction_prefer_nd_tiling_True_view_size6_num_block_pointers_3_num_triton_kernels_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_removed_buffers_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_unbacked_size_on_non_contig_dim_num_tile_candidates_1_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_unbacked_size_on_non_contig_dim_num_tile_candidates_2_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestCPU::test_welford_non_block_pointer_cpu, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_2d_reduction_multi_kernel_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_2d_reduction_odd_shapes_view_size0_num_block_pointers_1_num_triton_kernels_1_reduction_op0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_2d_reduction_odd_shapes_view_size1_num_block_pointers_3_num_triton_kernels_2_reduction_op1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_2d_reduction_odd_shapes_view_size2_num_block_pointers_1_num_triton_kernels_1_reduction_op2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_2d_reduction_odd_shapes_view_size3_num_block_pointers_1_num_triton_kernels_1_reduction_op3_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_2d_reduction_odd_shapes_view_size4_num_block_pointers_1_num_triton_kernels_1_reduction_op4_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_2d_reductions_mixed_indexing_reduction_op0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_2d_reductions_mixed_indexing_reduction_op1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_2d_welford_reduction_size0_expected_num_block_pointers_1_expected_num_triton_kernels_1_expect_fallback_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_2d_welford_reduction_size1_expected_num_block_pointers_9_expected_num_triton_kernels_2_expect_fallback_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_3d_permute_tiling_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_boundary_check_block_multiple_False_ynumel_exceed_ygrid_size_False_include_z_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_boundary_check_block_multiple_True_ynumel_exceed_ygrid_size_False_include_z_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_boundary_check_block_multiple_True_ynumel_exceed_ygrid_size_True_include_z_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_broadcast_prefer_nd_tiling_False_x_size0_y_size0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_broadcast_prefer_nd_tiling_False_x_size1_y_size1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_broadcast_prefer_nd_tiling_False_x_size2_y_size2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_broadcast_prefer_nd_tiling_False_x_size3_y_size3_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_broadcast_prefer_nd_tiling_True_x_size0_y_size0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_broadcast_prefer_nd_tiling_True_x_size1_y_size1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_broadcast_prefer_nd_tiling_True_x_size2_y_size2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_broadcast_prefer_nd_tiling_True_x_size3_y_size3_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_broadcast_with_singleton_dims_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_complex_reshape_block_ptr_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_dynamic_shapes_pointwise_multiple_max_block_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_dynamic_shapes_pointwise_nd_tiling_False_num_block_pointers_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_dynamic_shapes_pointwise_nd_tiling_True_num_block_pointers_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_dynamic_shapes_reduction_with_tiling_False_num_block_pointers_0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_dynamic_shapes_reduction_with_tiling_True_num_block_pointers_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_enable_tiled_reductions_tile_reductions_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_enable_tiled_reductions_tile_reductions_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_ensure_integral_dims_and_strides_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expand_broadcast_x_size0_y_size0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expand_broadcast_x_size1_y_size1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expand_broadcast_x_size2_y_size2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expand_broadcast_x_size3_y_size3_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expand_broadcast_x_size4_y_size4_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expand_broadcast_x_size5_y_size5_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expand_broadcast_x_size6_y_size6_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expand_broadcast_x_size7_y_size7_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expand_broadcast_x_size8_y_size8_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expand_broadcast_x_size9_y_size9_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expand_clone_broadcast_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expected_num_block_pointers_expected_num_block_pointers_3_raises_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_expected_num_block_pointers_expected_num_block_pointers_9_raises_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_fused_2d_reduction_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_mixed_pointwise_reduction_view_size0_num_block_pointers_2_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_mixed_pointwise_reduction_view_size1_num_block_pointers1_num_triton_kernels1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_mul_broadcast_multi_output_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_multiple_max_block_non_power_of_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_nd_tiling_odd_shapes_pointwise_full_size0_view_size0_num_block_pointers_3_num_tiles_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_nd_tiling_odd_shapes_pointwise_full_size1_view_size1_num_block_pointers_3_num_tiles_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_nd_tiling_odd_shapes_pointwise_full_size2_view_size2_num_block_pointers_3_num_tiles_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_nd_tiling_odd_shapes_pointwise_full_size3_view_size3_num_block_pointers_3_num_tiles_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_nd_tiling_odd_shapes_pointwise_full_size4_view_size4_num_block_pointers_3_num_tiles_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_nd_tiling_odd_shapes_pointwise_full_size5_view_size5_num_block_pointers_1_num_tiles_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_negative_strides_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_broadcast_nonzero_strides_prefer_nd_tiling_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_broadcast_nonzero_strides_prefer_nd_tiling_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_index_order_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_False_full_size0_view_size0_stride0_offset0_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_False_full_size1_view_size1_stride1_offset1_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_False_full_size2_view_size2_stride2_offset2_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_False_full_size3_view_size3_stride3_offset_10_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_False_full_size4_view_size4_stride4_offset4_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_False_full_size5_view_size5_stride5_offset5_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_False_full_size6_view_size6_stride6_offset6_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_False_full_size7_view_size7_stride7_offset7_require_block_ptr_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_False_full_size8_view_size8_stride8_offset8_require_block_ptr_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_False_full_size9_view_size9_stride9_offset9_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_True_full_size0_view_size0_stride0_offset0_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_True_full_size1_view_size1_stride1_offset1_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_True_full_size2_view_size2_stride2_offset2_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_True_full_size3_view_size3_stride3_offset_10_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_True_full_size4_view_size4_stride4_offset4_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_True_full_size5_view_size5_stride5_offset5_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_True_full_size6_view_size6_stride6_offset6_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_True_full_size7_view_size7_stride7_offset7_require_block_ptr_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_True_full_size8_view_size8_stride8_offset8_require_block_ptr_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_pointwise_prefer_nd_tiling_True_full_size9_view_size9_stride9_offset9_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_multiple_discontiguous_dims_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_False_view_size0_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_False_view_size1_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_False_view_size2_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_False_view_size3_num_block_pointers3_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_False_view_size4_num_block_pointers_3_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_False_view_size5_num_block_pointers_2_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_False_view_size6_num_block_pointers_3_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_True_view_size0_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_True_view_size1_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_True_view_size2_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_True_view_size3_num_block_pointers3_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_True_view_size4_num_block_pointers_3_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_True_view_size5_num_block_pointers_2_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_reduction_prefer_nd_tiling_True_view_size6_num_block_pointers_3_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_removed_buffers_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_unbacked_size_on_non_contig_dim_num_tile_candidates_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_unbacked_size_on_non_contig_dim_num_tile_candidates_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonBlockPointerTestGPU::test_welford_non_block_pointer_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_2d_reduction_multi_kernel_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_2d_reduction_odd_shapes_view_size0_num_block_pointers_1_num_triton_kernels_1_reduction_op0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_2d_reduction_odd_shapes_view_size1_num_block_pointers_3_num_triton_kernels_2_reduction_op1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_2d_reduction_odd_shapes_view_size2_num_block_pointers_1_num_triton_kernels_1_reduction_op2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_2d_reduction_odd_shapes_view_size3_num_block_pointers_1_num_triton_kernels_1_reduction_op3_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_2d_reduction_odd_shapes_view_size4_num_block_pointers_1_num_triton_kernels_1_reduction_op4_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_2d_reductions_mixed_indexing_reduction_op0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_2d_reductions_mixed_indexing_reduction_op1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_2d_welford_reduction_size0_expected_num_block_pointers_1_expected_num_triton_kernels_1_expect_fallback_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_2d_welford_reduction_size1_expected_num_block_pointers_9_expected_num_triton_kernels_2_expect_fallback_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_3d_permute_tiling_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_boundary_check_block_multiple_False_ynumel_exceed_ygrid_size_False_include_z_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_boundary_check_block_multiple_True_ynumel_exceed_ygrid_size_False_include_z_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_boundary_check_block_multiple_True_ynumel_exceed_ygrid_size_True_include_z_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_broadcast_prefer_nd_tiling_False_x_size0_y_size0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_broadcast_prefer_nd_tiling_False_x_size1_y_size1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_broadcast_prefer_nd_tiling_False_x_size2_y_size2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_broadcast_prefer_nd_tiling_False_x_size3_y_size3_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_broadcast_prefer_nd_tiling_True_x_size0_y_size0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_broadcast_prefer_nd_tiling_True_x_size1_y_size1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_broadcast_prefer_nd_tiling_True_x_size2_y_size2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_broadcast_prefer_nd_tiling_True_x_size3_y_size3_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_broadcast_with_singleton_dims_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_complex_reshape_block_ptr_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_dynamic_shapes_pointwise_multiple_max_block_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_dynamic_shapes_pointwise_nd_tiling_False_num_block_pointers_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_dynamic_shapes_pointwise_nd_tiling_True_num_block_pointers_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_dynamic_shapes_reduction_with_tiling_False_num_block_pointers_0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_dynamic_shapes_reduction_with_tiling_True_num_block_pointers_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_enable_tiled_reductions_tile_reductions_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_enable_tiled_reductions_tile_reductions_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_ensure_integral_dims_and_strides_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expand_broadcast_x_size0_y_size0_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expand_broadcast_x_size1_y_size1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expand_broadcast_x_size2_y_size2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expand_broadcast_x_size3_y_size3_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expand_broadcast_x_size4_y_size4_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expand_broadcast_x_size5_y_size5_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expand_broadcast_x_size6_y_size6_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expand_broadcast_x_size7_y_size7_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expand_broadcast_x_size8_y_size8_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expand_broadcast_x_size9_y_size9_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expand_clone_broadcast_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expected_num_block_pointers_expected_num_block_pointers_3_raises_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_expected_num_block_pointers_expected_num_block_pointers_9_raises_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_fused_2d_reduction_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_mixed_pointwise_reduction_view_size0_num_block_pointers_2_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_mixed_pointwise_reduction_view_size1_num_block_pointers1_num_triton_kernels1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_mul_broadcast_multi_output_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_multiple_max_block_non_power_of_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_nd_tiling_odd_shapes_pointwise_full_size0_view_size0_num_block_pointers_3_num_tiles_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_nd_tiling_odd_shapes_pointwise_full_size1_view_size1_num_block_pointers_3_num_tiles_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_nd_tiling_odd_shapes_pointwise_full_size2_view_size2_num_block_pointers_3_num_tiles_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_nd_tiling_odd_shapes_pointwise_full_size3_view_size3_num_block_pointers_3_num_tiles_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_nd_tiling_odd_shapes_pointwise_full_size4_view_size4_num_block_pointers_3_num_tiles_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_nd_tiling_odd_shapes_pointwise_full_size5_view_size5_num_block_pointers_1_num_tiles_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_negative_strides_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_broadcast_nonzero_strides_prefer_nd_tiling_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_broadcast_nonzero_strides_prefer_nd_tiling_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_index_order_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_False_full_size0_view_size0_stride0_offset0_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_False_full_size1_view_size1_stride1_offset1_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_False_full_size2_view_size2_stride2_offset2_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_False_full_size3_view_size3_stride3_offset_10_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_False_full_size4_view_size4_stride4_offset4_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_False_full_size5_view_size5_stride5_offset5_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_False_full_size6_view_size6_stride6_offset6_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_False_full_size7_view_size7_stride7_offset7_require_block_ptr_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_False_full_size8_view_size8_stride8_offset8_require_block_ptr_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_False_full_size9_view_size9_stride9_offset9_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_True_full_size0_view_size0_stride0_offset0_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_True_full_size1_view_size1_stride1_offset1_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_True_full_size2_view_size2_stride2_offset2_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_True_full_size3_view_size3_stride3_offset_10_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_True_full_size4_view_size4_stride4_offset4_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_True_full_size5_view_size5_stride5_offset5_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_True_full_size6_view_size6_stride6_offset6_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_True_full_size7_view_size7_stride7_offset7_require_block_ptr_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_True_full_size8_view_size8_stride8_offset8_require_block_ptr_False_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_pointwise_prefer_nd_tiling_True_full_size9_view_size9_stride9_offset9_require_block_ptr_True_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_multiple_discontiguous_dims_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_False_view_size0_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_False_view_size1_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_False_view_size2_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_False_view_size3_num_block_pointers3_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_False_view_size4_num_block_pointers_3_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_False_view_size5_num_block_pointers_2_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_False_view_size6_num_block_pointers_3_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_True_view_size0_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_True_view_size1_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_True_view_size2_num_block_pointers_1_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_True_view_size3_num_block_pointers3_num_triton_kernels_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_True_view_size4_num_block_pointers_3_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_True_view_size5_num_block_pointers_2_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_reduction_prefer_nd_tiling_True_view_size6_num_block_pointers_3_num_triton_kernels_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_removed_buffers_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_unbacked_size_on_non_contig_dim_num_tile_candidates_1_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_unbacked_size_on_non_contig_dim_num_tile_candidates_2_cuda, test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_welford_non_block_pointer_cuda 2025-09-07T07:27:18.8222286Z 2025-09-07T07:27:18.8222478Z Running inductor/test_smoke 1/1 ... [2025-09-07 07:27:18.791972] 2025-09-07T07:27:18.8222958Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:18.8223891Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_smoke.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:18.792286] 2025-09-07T07:27:19.0469688Z 2025-09-07T07:27:19.0470487Z dynamo/test_interop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_interop_1.1_8b2a5e6945ae9060_.log 2025-09-07T07:27:19.0472571Z Running 5 items in this shard: test/dynamo/test_interop.py::InteropTests::test_fx_fn, test/dynamo/test_interop.py::InteropTests::test_script_fn, test/dynamo/test_interop.py::InteropTests::test_staticmethod_script_fn, test/dynamo/test_interop.py::InteropTests::test_trace_fn, test/dynamo/test_interop.py::InteropTests::test_vmap_in_graph 2025-09-07T07:27:19.0474051Z 2025-09-07T07:27:19.0474308Z Running torch_np/test_ufuncs_basic 1/1 ... [2025-09-07 07:27:19.047131] 2025-09-07T07:27:19.0474807Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:19.0476896Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_ufuncs_basic.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:19.047493] 2025-09-07T07:27:19.3264238Z 2025-09-07T07:27:19.3265334Z dynamo/test_metrics_context 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_metrics_context_1.1_941909de0dafc462_.log 2025-09-07T07:27:19.3271261Z Running 9 items in this shard: test/dynamo/test_metrics_context.py::TestMetricsContext::test_add_to_set, test/dynamo/test_metrics_context.py::TestMetricsContext::test_context_exists, test/dynamo/test_metrics_context.py::TestMetricsContext::test_nested_context, test/dynamo/test_metrics_context.py::TestMetricsContext::test_set, test/dynamo/test_metrics_context.py::TestMetricsContext::test_set_disallow_overwrite, test/dynamo/test_metrics_context.py::TestMetricsContext::test_set_key_value, test/dynamo/test_metrics_context.py::TestMetricsContext::test_top_n, test/dynamo/test_metrics_context.py::TestMetricsContext::test_update_allow_overwrite, test/dynamo/test_metrics_context.py::TestMetricsContext::test_update_disallow_overwrite 2025-09-07T07:27:19.3275427Z 2025-09-07T07:27:19.3275654Z Running test_proxy_tensor 1/1 ... [2025-09-07 07:27:19.326532] 2025-09-07T07:27:19.3276164Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:19.3277487Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_proxy_tensor.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:19.326839] 2025-09-07T07:27:19.9262863Z 2025-09-07T07:27:19.9264170Z test_functionalization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functionalization_1.1_f05544953c839e37_.log 2025-09-07T07:27:19.9307095Z Running 112 items in this shard: test/test_functionalization.py::TestFunctionalization::test_advanced_indexing, test/test_functionalization.py::TestFunctionalization::test_advanced_indexing_correct_strides, test/test_functionalization.py::TestFunctionalization::test_aliases_maintained_after_pass_when_reapplying_views, test/test_functionalization.py::TestFunctionalization::test_as_strided, test/test_functionalization.py::TestFunctionalization::test_batch_norm, test/test_functionalization.py::TestFunctionalization::test_cat, test/test_functionalization.py::TestFunctionalization::test_channels_last_contiguous, test/test_functionalization.py::TestFunctionalization::test_copy_, test/test_functionalization.py::TestFunctionalization::test_copy_stride_mismatch, test/test_functionalization.py::TestFunctionalization::test_diagonal, test/test_functionalization.py::TestFunctionalization::test_diagonal_mutated_input, test/test_functionalization.py::TestFunctionalization::test_everything, test/test_functionalization.py::TestFunctionalization::test_expand_symint, test/test_functionalization.py::TestFunctionalization::test_fill_, test/test_functionalization.py::TestFunctionalization::test_freeze, test/test_functionalization.py::TestFunctionalization::test_index_mutation_on_non_input, test/test_functionalization.py::TestFunctionalization::test_inplace_on_non_view, test/test_functionalization.py::TestFunctionalization::test_instance_norm, test/test_functionalization.py::TestFunctionalization::test_metadata_change, test/test_functionalization.py::TestFunctionalization::test_metadata_change_out_op, test/test_functionalization.py::TestFunctionalization::test_mixed_wrappers_invalid, test/test_functionalization.py::TestFunctionalization::test_mixed_wrappers_valid, test/test_functionalization.py::TestFunctionalization::test_multi_out, test/test_functionalization.py::TestFunctionalization::test_multiple_views_of_same_base, test/test_functionalization.py::TestFunctionalization::test_mutable_op_not_inplace_or_other, test/test_functionalization.py::TestFunctionalization::test_mutation_overlapping_mem, test/test_functionalization.py::TestFunctionalization::test_nested_functions_propagate_updates, test/test_functionalization.py::TestFunctionalization::test_only_one_view, test/test_functionalization.py::TestFunctionalization::test_optional_tensor_list, test/test_functionalization.py::TestFunctionalization::test_python_functionalization, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_conj, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_is_conj, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_is_neg, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_lift_fresh, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_lift_fresh_storage, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_neg, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_zero_tensor, test/test_functionalization.py::TestFunctionalization::test_reapply_views_simple, test/test_functionalization.py::TestFunctionalization::test_resize_larger_invalid, test/test_functionalization.py::TestFunctionalization::test_resize_larger_valid, test/test_functionalization.py::TestFunctionalization::test_resize_same_size_diff_rank, test/test_functionalization.py::TestFunctionalization::test_resize_smaller, test/test_functionalization.py::TestFunctionalization::test_save_for_backwards_segfault, test/test_functionalization.py::TestFunctionalization::test_scalars, test/test_functionalization.py::TestFunctionalization::test_set_, test/test_functionalization.py::TestFunctionalization::test_simple, test/test_functionalization.py::TestFunctionalization::test_simple_out, test/test_functionalization.py::TestFunctionalization::test_slice, test/test_functionalization.py::TestFunctionalization::test_split, test/test_functionalization.py::TestFunctionalization::test_split_with_sizes, test/test_functionalization.py::TestFunctionalization::test_tensor_ctr, test/test_functionalization.py::TestFunctionalization::test_tensor_list_composite, test/test_functionalization.py::TestFunctionalization::test_tensor_list_mixed_functional_nonfunctional, test/test_functionalization.py::TestFunctionalization::test_unbind, test/test_functionalization.py::TestFunctionalization::test_view_clone_view_inplace, test/test_functionalization.py::TestFunctionalization::test_view_inplace, test/test_functionalization.py::TestCrossRefFunctionalization::test_advanced_indexing, test/test_functionalization.py::TestCrossRefFunctionalization::test_advanced_indexing_correct_strides, test/test_functionalization.py::TestCrossRefFunctionalization::test_aliases_maintained_after_pass_when_reapplying_views, test/test_functionalization.py::TestCrossRefFunctionalization::test_as_strided, test/test_functionalization.py::TestCrossRefFunctionalization::test_batch_norm, test/test_functionalization.py::TestCrossRefFunctionalization::test_cat, test/test_functionalization.py::TestCrossRefFunctionalization::test_channels_last_contiguous, test/test_functionalization.py::TestCrossRefFunctionalization::test_copy_, test/test_functionalization.py::TestCrossRefFunctionalization::test_copy_stride_mismatch, test/test_functionalization.py::TestCrossRefFunctionalization::test_diagonal, test/test_functionalization.py::TestCrossRefFunctionalization::test_diagonal_mutated_input, test/test_functionalization.py::TestCrossRefFunctionalization::test_everything, test/test_functionalization.py::TestCrossRefFunctionalization::test_expand_symint, test/test_functionalization.py::TestCrossRefFunctionalization::test_fill_, test/test_functionalization.py::TestCrossRefFunctionalization::test_freeze, test/test_functionalization.py::TestCrossRefFunctionalization::test_index_mutation_on_non_input, test/test_functionalization.py::TestCrossRefFunctionalization::test_inplace_on_non_view, test/test_functionalization.py::TestCrossRefFunctionalization::test_instance_norm, test/test_functionalization.py::TestCrossRefFunctionalization::test_metadata_change, test/test_functionalization.py::TestCrossRefFunctionalization::test_metadata_change_out_op, test/test_functionalization.py::TestCrossRefFunctionalization::test_mixed_wrappers_invalid, test/test_functionalization.py::TestCrossRefFunctionalization::test_mixed_wrappers_valid, test/test_functionalization.py::TestCrossRefFunctionalization::test_multi_out, test/test_functionalization.py::TestCrossRefFunctionalization::test_multiple_views_of_same_base, test/test_functionalization.py::TestCrossRefFunctionalization::test_mutable_op_not_inplace_or_other, test/test_functionalization.py::TestCrossRefFunctionalization::test_mutation_overlapping_mem, test/test_functionalization.py::TestCrossRefFunctionalization::test_nested_functions_propagate_updates, test/test_functionalization.py::TestCrossRefFunctionalization::test_only_one_view, test/test_functionalization.py::TestCrossRefFunctionalization::test_optional_tensor_list, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_conj, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_is_conj, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_is_neg, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_lift_fresh, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_lift_fresh_storage, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_neg, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_zero_tensor, test/test_functionalization.py::TestCrossRefFunctionalization::test_reapply_views_simple, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_larger_invalid, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_larger_valid, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_same_size_diff_rank, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_smaller, test/test_functionalization.py::TestCrossRefFunctionalization::test_save_for_backwards_segfault, test/test_functionalization.py::TestCrossRefFunctionalization::test_scalars, test/test_functionalization.py::TestCrossRefFunctionalization::test_set_, test/test_functionalization.py::TestCrossRefFunctionalization::test_simple, test/test_functionalization.py::TestCrossRefFunctionalization::test_simple_out, test/test_functionalization.py::TestCrossRefFunctionalization::test_slice, test/test_functionalization.py::TestCrossRefFunctionalization::test_split, test/test_functionalization.py::TestCrossRefFunctionalization::test_split_with_sizes, test/test_functionalization.py::TestCrossRefFunctionalization::test_tensor_ctr, test/test_functionalization.py::TestCrossRefFunctionalization::test_tensor_list_composite, test/test_functionalization.py::TestCrossRefFunctionalization::test_tensor_list_mixed_functional_nonfunctional, test/test_functionalization.py::TestCrossRefFunctionalization::test_unbind, test/test_functionalization.py::TestCrossRefFunctionalization::test_view_clone_view_inplace, test/test_functionalization.py::TestCrossRefFunctionalization::test_view_inplace 2025-09-07T07:27:19.9339688Z 2025-09-07T07:27:19.9339874Z Running inductor/test_fx_fusion 1/1 ... [2025-09-07 07:27:19.926534] 2025-09-07T07:27:19.9340263Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:19.9341187Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fx_fusion.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:19.926869] 2025-09-07T07:27:21.2848122Z 2025-09-07T07:27:21.2848930Z dynamo/test_inline_and_install 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_inline_and_install_1.1_0a7ea4c8257d667d_.log 2025-09-07T07:27:21.2930014Z Running 184 items in this shard: test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_access_class_method_from_user_class_attr_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_access_class_method_from_user_class_builtin_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_byte_tensor_does_not_crash_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_capture_symbolic_tracing_simple_within_fake_mode_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_capture_symbolic_tracing_within_fake_mode_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_free_variables_overlapping_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_op_param_buffer_lifted_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_raise_user_error_on_branch_args_mismatch_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_raise_user_error_on_branch_return_multiple_tensors_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_raise_user_error_on_branch_return_non_tensor_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_raise_user_error_on_mismatch_return_length_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_raise_user_error_on_mismatch_return_tensor_meta_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_raise_user_error_on_missing_args_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_raise_user_error_on_non_list_operands_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_raise_user_error_on_non_tensor_operands_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_raise_user_error_on_unsupported_pred_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_cond_supported_pred_types_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_constraint_violation_error_messages_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dataclass_input_output_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dict_return_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dict_return_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_2_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_2_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_and_bypass_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_and_bypass_reorder_with_non_tensor_arg_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_and_bypass_reorder_with_non_tensor_arg_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_and_bypass_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_and_bypass_with_non_tensor_arg_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_and_bypass_with_non_tensor_arg_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_and_bypass_with_non_tensor_output_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_and_bypass_with_non_tensor_output_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dupes_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dynamic_slicing_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dynamic_slicing_invalid_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dynamic_slicing_simple_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dynamo_enum_in_tuple_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_dynamo_list_index_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_empty_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_enforce_equalities_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_compare_optimize_with_make_fx_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_cond_in_aten_symbolic_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_control_flow_with_getattr_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_decomp_asserts_bad_args_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_decomp_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_defaults_ok_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_dynamic_control_flow_error_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_dynamic_dim_cleanup_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_dynamic_dim_not_1_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_dynamic_dim_range_constraint_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_graph_bypass_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_graph_bypass_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_graph_with_complex_reorder_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_graph_with_complex_reorder_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_graph_with_list_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_graph_with_list_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_identity_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_masking_with_no_grad_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_meta_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_meta_val_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_mismatched_out_2_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_mismatched_out_2_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_mismatched_out_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_mismatched_out_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_module_specify_constraints_signature_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_multi_dynamic_dim_constraint_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_multi_dynamic_dim_unsafe_relationship_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_nn_module_stack_patched_module_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_no_raise_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_no_tensor_computation_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_pass_arg_by_name_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_pass_arg_by_name_star_args_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_persist_assert_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_preserve_constraints_as_metadata_tensor_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_preserves_nn_module_stack_for_get_attr_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_raise_guard_full_constraint_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_raise_guard_partial_constraint_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_raise_on_relationship_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_shape_control_flow_1_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_specialized_int_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_symbolic_shape_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_args_and_empty_kwargs_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_args_with_default_None_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_args_with_default_float_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_args_with_default_tensor_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_args_with_default_tuple_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_builtin_op_on_assume_constant_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_cond_branches_calling_methods_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_cond_closure_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_cond_dynamic_shape_pred_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_cond_with_closed_function_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_dict_values_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_free_function_and_class_method_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_free_function_and_class_method_multiarg_diff_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_free_function_and_class_method_multiarg_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_free_function_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_global_function_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_in_unspecialized_nn_module_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_list_nonzero_free_function_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_list_nonzero_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_method_on_module_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_method_on_module_invoke_twice_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_none_control_flow_free_func_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_none_control_flow_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_not_none_control_flow_free_func_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_not_none_control_flow_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_not_none_control_flow_pos_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_not_return_const_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_constant_tuple_nonzero_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_functools_wrapped_fn_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_functools_wrapped_method_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_kwargs_and_empty_args_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_kwargs_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_kwargs_with_default_None_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_kwargs_with_default_float_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_kwargs_with_default_tensor_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_kwargs_with_default_tuple_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_map_cond_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_map_zero_sized_tensor_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_map_zero_sized_tensor_suppress_errors_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_module_layer_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_nonzero_static_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_shallow_list_copy_with_side_effects_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_shallow_list_copy_wo_side_effects_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_stack_trace_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_symbool_inputs_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_export_with_wrapped_fn_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_exported_graph_serialization_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_func_return_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_func_return_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_fx_pytree_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_immutable_list_dict_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_input_container_type_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_invalid_input_global_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_invalid_input_global_multiple_access_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_invalid_input_nonlocal_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_invalid_input_unused_nonlocal_ok_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_list_contains_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_list_not_contains_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_list_unpack_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_list_unpack_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_map_cond_param_buffer_lifted_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_mixed_real_and_fake_inputs_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_multiple_outputs_op_with_evaluator_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_nested_cond_op_param_buffer_lifted_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_no_tensor_computation_2_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_no_tensor_computation_2_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_no_tensor_computation_fail_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_no_tensor_computation_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_not_functionalize_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_param_buffer_safe_from_mutation_recurse_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_param_buffer_safe_from_mutation_simple_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_pre_dispatch_simple_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_predispatch_with_for_out_dtype_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_predispatch_with_for_out_dtype_nested_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_predispatch_with_higher_order_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_predispatch_with_higher_order_nested_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_preserve_fx_node_metadata_graph_break_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_preserve_fx_node_metadata_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_preserve_fx_node_metadata_inline_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_preserve_fx_node_metadata_recompile_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_remove_redundant_dynamic_dim_in_error_message_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_retracibility_dict_container_inp_out_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_retracibility_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_retracibility_nested_list_out_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_round_dynamic_shapes_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_strict_fake_tensor_prop_real_tensors_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_subclass_parameters_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_sum_param_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_sym_contains_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_symbolic_tracing_within_fake_mode_with_constraints_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_symbolic_tracing_within_fake_mode_with_constraints_with_parameters_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_symbool_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_torch_inference_mode_ctx_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_trivial_constraint_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_uncaptured_higher_order_op_error_not_suppresed_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_untracked_inputs_in_constraints_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_zeroes_in_and_out_different_shape_on_test_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_zeroes_in_and_out_different_shape_on_test_with_aten_graph_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_zeroes_in_new_shape_scalar_out_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_zeroes_in_new_shape_scalar_out_permute_dupe_and_bypass_inline_and_install, test/dynamo/test_inline_and_install.py::InlineAndInstallExportTests::test_zeroes_in_new_shape_scalar_out_permute_inline_and_install 2025-09-07T07:27:21.3005098Z 2025-09-07T07:27:21.3005341Z Running inductor/test_move_constructors_to_cuda 1/1 ... [2025-09-07 07:27:21.285398] 2025-09-07T07:27:21.3005775Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:21.3006750Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_move_constructors_to_cuda.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:21.285773] 2025-09-07T07:27:22.1416544Z 2025-09-07T07:27:22.1417545Z test_foreach 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_foreach_1.1_0cb56410abc3e03e_.log 2025-09-07T07:27:22.2563181Z Running 3577 items in this shard: test/test_foreach.py::TestForeachCUDA::test_0dim_tensor_overload_cpu_ok_cuda, test/test_foreach.py::TestForeachCUDA::test_0dim_tensor_overload_exception_cuda, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_norm_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_div_reciprocal_cuda, test/test_foreach.py::TestForeachCUDA::test_foreach_check_stride_ignore_dims_of_one_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes_large_input_cuda, test/test_foreach.py::TestForeachCUDA::test_foreach_l2_large_value_input__foreach_norm_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_l2_large_value_input__foreach_norm_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcdiv_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcmul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_tensors_grouping_cuda, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_uint8 2025-09-07T07:27:22.3683858Z 2025-09-07T07:27:22.3684067Z Running dynamo/test_skip_non_tensor 1/1 ... [2025-09-07 07:27:22.146278] 2025-09-07T07:27:22.3684459Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:22.3685560Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_skip_non_tensor.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:22.146633] 2025-09-07T07:27:23.5186978Z 2025-09-07T07:27:23.5187967Z torch_np/test_ufuncs_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_ufuncs_basic_1.1_c2eb4c9fcb7aaef3_.log 2025-09-07T07:27:23.5320034Z Running 371 items in this shard: test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_scalar_ufunc0, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_equiv_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_equiv_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_equiv_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_no_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_no_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_no_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_safe_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_safe_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_safe_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_same_kind_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_same_kind_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_same_kind_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_unsafe_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_unsafe_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_unsafe_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_ufunc0, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_broadcast_ufunc0, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_equiv_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_equiv_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_equiv_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_no_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_no_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_no_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_safe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_safe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_safe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_same_kind_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_same_kind_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_same_kind_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_unsafe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_unsafe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_unsafe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc0, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc1, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc10, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc11, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc12, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc13, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc14, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc15, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc16, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc2, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc3, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc4, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc5, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc6, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc7, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc8, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc9, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc0, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc1, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc10, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc11, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc12, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc13, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc14, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc15, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc16, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc2, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc3, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc4, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc5, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc6, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc7, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc8, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc9, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc0, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc1, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc10, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc11, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc12, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc13, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc14, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc15, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc16, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc2, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc3, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc4, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc5, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc6, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc7, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc8, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc9, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_basic_ufunc0_op0_iop0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_basic_ufunc1_op1_iop1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_basic_ufunc2_op2_iop2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_bcast_ufunc0_op0_iop0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_bcast_ufunc1_op1_iop1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_bcast_ufunc2_op2_iop2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestUfuncDtypeKwd::test_binary_ufunc_dtype, test/torch_np/test_ufuncs_basic.py::TestUfuncDtypeKwd::test_binary_ufunc_dtype_and_out 2025-09-07T07:27:23.5445142Z 2025-09-07T07:27:23.5445338Z Running export/test_tree_utils 1/1 ... [2025-09-07 07:27:23.519342] 2025-09-07T07:27:23.5445712Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:23.5446723Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tree_utils.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:23.519698] 2025-09-07T07:27:24.8491481Z 2025-09-07T07:27:24.8493095Z test_proxy_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_proxy_tensor_1.1_47a859b58c5d75db_.log 2025-09-07T07:27:24.8541414Z Running 173 items in this shard: test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_allclose, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_amp_cache, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_constant_blowup, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_constant_proxy_tensor_mut, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_constant_random, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_constant_unbind, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_decomp_of_capture, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_decomposition_interpreter, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_empty_like_doesnt_burn_in_defaults, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_inplace_metadata, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_isolated_graphmodule, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_make_fx_model_double_param, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_make_fx_model_fwd_bwd, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_make_fx_model_fwd_bwd_wgtupdate, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_make_fx_overloads, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_make_fx_reentrant_dispatch, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_make_fx_simple, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_mode_tracing_factory_function, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_partial_decomp, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_pickle_issue89626, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_pr_86917, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_pre_dispatch_functionalization, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_pre_dispatch_functionalization_view_op, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_pre_dispatch_linear, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_pre_dispatch_mode_stack, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_pre_dispatch_no_grad, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_proxy_tensor, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_proxy_tensor_mode_with_decomp_table_preserves_proxy, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_resnet18_backward_trace, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_scalar_device, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_strides, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_tensor_constants, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_trace_subclasses, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_val_metadata_mutation, test/test_proxy_tensor.py::TestGenericProxyTensorReal::test_varargs, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_allclose, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_amp_cache, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_constant_blowup, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_constant_proxy_tensor_mut, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_constant_random, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_constant_unbind, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_decomp_of_capture, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_decomposition_interpreter, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_empty_like_doesnt_burn_in_defaults, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_inplace_metadata, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_isolated_graphmodule, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_make_fx_model_double_param, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_make_fx_model_fwd_bwd, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_make_fx_model_fwd_bwd_wgtupdate, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_make_fx_overloads, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_make_fx_reentrant_dispatch, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_make_fx_simple, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_mode_tracing_factory_function, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_partial_decomp, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_pickle_issue89626, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_pr_86917, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_pre_dispatch_functionalization, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_pre_dispatch_functionalization_view_op, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_pre_dispatch_linear, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_pre_dispatch_mode_stack, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_pre_dispatch_no_grad, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_proxy_tensor, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_proxy_tensor_mode_with_decomp_table_preserves_proxy, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_resnet18_backward_trace, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_scalar_device, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_strides, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_tensor_constants, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_trace_subclasses, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_val_metadata_mutation, test/test_proxy_tensor.py::TestGenericProxyTensorFake::test_varargs, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_allclose, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_amp_cache, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_constant_blowup, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_constant_proxy_tensor_mut, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_constant_random, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_constant_unbind, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_decomp_of_capture, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_decomposition_interpreter, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_empty_like_doesnt_burn_in_defaults, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_inplace_metadata, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_isolated_graphmodule, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_make_fx_model_double_param, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_make_fx_model_fwd_bwd, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_make_fx_model_fwd_bwd_wgtupdate, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_make_fx_overloads, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_make_fx_reentrant_dispatch, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_make_fx_simple, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_mode_tracing_factory_function, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_partial_decomp, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_pickle_issue89626, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_pr_86917, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_pre_dispatch_functionalization, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_pre_dispatch_functionalization_view_op, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_pre_dispatch_linear, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_pre_dispatch_mode_stack, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_pre_dispatch_no_grad, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_proxy_tensor, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_proxy_tensor_mode_with_decomp_table_preserves_proxy, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_resnet18_backward_trace, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_scalar_device, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_strides, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_tensor_constants, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_trace_subclasses, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_val_metadata_mutation, test/test_proxy_tensor.py::TestGenericProxyTensorSymbolic::test_varargs, test/test_proxy_tensor.py::TestRealProxyTensor::test_error_on_data_dependent_ops, test/test_proxy_tensor.py::TestFakeProxyTensor::test_alias, test/test_proxy_tensor.py::TestFakeProxyTensor::test_fake_tensor_mode, test/test_proxy_tensor.py::TestFakeProxyTensor::test_free_fake, test/test_proxy_tensor.py::TestFakeProxyTensor::test_fused_adam, test/test_proxy_tensor.py::TestFakeProxyTensor::test_issue82547, test/test_proxy_tensor.py::TestFakeProxyTensor::test_meta, test/test_proxy_tensor.py::TestFakeProxyTensor::test_use_fake_and_tensor, test/test_proxy_tensor.py::TestSymbolicTracing::test_adv_index_batch, test/test_proxy_tensor.py::TestSymbolicTracing::test_arange_unbacked_output_size, test/test_proxy_tensor.py::TestSymbolicTracing::test_binary_broadcast, test/test_proxy_tensor.py::TestSymbolicTracing::test_boolean_index, test/test_proxy_tensor.py::TestSymbolicTracing::test_broadcast_shapes, test/test_proxy_tensor.py::TestSymbolicTracing::test_cat, test/test_proxy_tensor.py::TestSymbolicTracing::test_constant_specialization, test/test_proxy_tensor.py::TestSymbolicTracing::test_cpu_scalar_cuda, test/test_proxy_tensor.py::TestSymbolicTracing::test_cumsum_unbacked, test/test_proxy_tensor.py::TestSymbolicTracing::test_debug_interpreter, test/test_proxy_tensor.py::TestSymbolicTracing::test_deduped_shape, test/test_proxy_tensor.py::TestSymbolicTracing::test_dynamic_pointwise_scalar, test/test_proxy_tensor.py::TestSymbolicTracing::test_elementwise_meta_with_sym_numbers, test/test_proxy_tensor.py::TestSymbolicTracing::test_expand, test/test_proxy_tensor.py::TestSymbolicTracing::test_fake_tensor_as_size, test/test_proxy_tensor.py::TestSymbolicTracing::test_guard_lowerbound_range_refinement, test/test_proxy_tensor.py::TestSymbolicTracing::test_guard_lowerbound_range_refinement_multivariate, test/test_proxy_tensor.py::TestSymbolicTracing::test_guard_upperbound_range_refinement, test/test_proxy_tensor.py::TestSymbolicTracing::test_guard_upperbound_range_refinement_multivariate, test/test_proxy_tensor.py::TestSymbolicTracing::test_guards_equal, test/test_proxy_tensor.py::TestSymbolicTracing::test_int_input, test/test_proxy_tensor.py::TestSymbolicTracing::test_invalidate_nonzero, test/test_proxy_tensor.py::TestSymbolicTracing::test_invalidate_nonzero_propagate_real_tensors, test/test_proxy_tensor.py::TestSymbolicTracing::test_item, test/test_proxy_tensor.py::TestSymbolicTracing::test_item_to_constructor, test/test_proxy_tensor.py::TestSymbolicTracing::test_make_fx_with_custom_tracer_preserving_nn_module_stack, test/test_proxy_tensor.py::TestSymbolicTracing::test_mega_guard, test/test_proxy_tensor.py::TestSymbolicTracing::test_metadata, test/test_proxy_tensor.py::TestSymbolicTracing::test_metadata_fresh, test/test_proxy_tensor.py::TestSymbolicTracing::test_mod_gcd_unbacked, test/test_proxy_tensor.py::TestSymbolicTracing::test_multiply_shape, test/test_proxy_tensor.py::TestSymbolicTracing::test_neg_shape, test/test_proxy_tensor.py::TestSymbolicTracing::test_new_empty, test/test_proxy_tensor.py::TestSymbolicTracing::test_non_deduped_shape, test/test_proxy_tensor.py::TestSymbolicTracing::test_non_symint_size_spec, test/test_proxy_tensor.py::TestSymbolicTracing::test_nonidentity_transitive_guards, test/test_proxy_tensor.py::TestSymbolicTracing::test_reflect_r_over_x, test/test_proxy_tensor.py::TestSymbolicTracing::test_repeat_interleave, test/test_proxy_tensor.py::TestSymbolicTracing::test_repeat_interleave_unbacked_output_size, test/test_proxy_tensor.py::TestSymbolicTracing::test_reshape_divisibility_unbacked, test/test_proxy_tensor.py::TestSymbolicTracing::test_resize_from_zero, test/test_proxy_tensor.py::TestSymbolicTracing::test_return_symint, test/test_proxy_tensor.py::TestSymbolicTracing::test_rmethod, test/test_proxy_tensor.py::TestSymbolicTracing::test_setitem_symint, test/test_proxy_tensor.py::TestSymbolicTracing::test_size_with_tensor, test/test_proxy_tensor.py::TestSymbolicTracing::test_split_unbacked_sizes, test/test_proxy_tensor.py::TestSymbolicTracing::test_sqrt_size, test/test_proxy_tensor.py::TestSymbolicTracing::test_sym_storage_offset, test/test_proxy_tensor.py::TestSymbolicTracing::test_symbolic_repeat_interleave, test/test_proxy_tensor.py::TestSymbolicTracing::test_symint_to_tensor, test/test_proxy_tensor.py::TestSymbolicTracing::test_tensor_symfloat, test/test_proxy_tensor.py::TestSymbolicTracing::test_unary, test/test_proxy_tensor.py::TestSymbolicTracing::test_unbacked_batch_resnet, test/test_proxy_tensor.py::TestSymbolicTracing::test_unbacked_slice, test/test_proxy_tensor.py::TestSymbolicTracing::test_unbacked_unification, test/test_proxy_tensor.py::TestSymbolicTracing::test_unbacked_unify_dependency_violation, test/test_proxy_tensor.py::TestSymbolicTracing::test_unbacked_unify_guard, test/test_proxy_tensor.py::TestSymbolicTracing::test_unbacked_unify_guard_transitivity, test/test_proxy_tensor.py::TestSymbolicTracing::test_view_divisibility_unbacked, test/test_proxy_tensor.py::TestSymbolicTracing::test_view_divisibility_unbacked_relatively_prime 2025-09-07T07:27:24.8586326Z 2025-09-07T07:27:24.8586518Z Running dynamo/test_frame_init 1/1 ... [2025-09-07 07:27:24.849365] 2025-09-07T07:27:24.8586891Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:24.8588021Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_frame_init.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:24.849718] 2025-09-07T07:27:25.1985046Z 2025-09-07T07:27:25.1986072Z inductor/test_fx_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fx_fusion_1.1_6eb2c0407bb4d31d_.log 2025-09-07T07:27:25.1988140Z Running 4 items in this shard: test/inductor/test_fx_fusion.py::TestFxFusion::test_linear_permute_fusion, test/inductor/test_fx_fusion.py::TestFxFusion::test_permute_bmm_fusion, test/inductor/test_fx_fusion.py::TestFxFusion::test_permute_linear_fusion, test/inductor/test_fx_fusion.py::TestFxFusion::test_sink_cat_after_pointwise 2025-09-07T07:27:25.1989571Z 2025-09-07T07:27:25.1989789Z Running torch_np/test_dtype 1/1 ... [2025-09-07 07:27:25.198640] 2025-09-07T07:27:25.1990250Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:25.1991477Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_dtype.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:25.198955] 2025-09-07T07:27:25.9667594Z 2025-09-07T07:27:25.9668653Z inductor/test_smoke 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_smoke_1.1_43cd6e361d71319b_.log 2025-09-07T07:27:25.9669433Z 2025-09-07T07:27:25.9670336Z Running inductor/test_indexing 1/1 ... [2025-09-07 07:27:25.966884] 2025-09-07T07:27:25.9670829Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:25.9674359Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_indexing.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:25.967198] 2025-09-07T07:27:26.2169664Z 2025-09-07T07:27:26.2171901Z dynamo/test_skip_non_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_skip_non_tensor_1.1_546af14e31c7112a_.log 2025-09-07T07:27:26.2175409Z Running 8 items in this shard: test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_skip, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor1, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor2, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor_dict, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor_list, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_custom_list, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_do_not_skip_side_effects, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_recursive_list 2025-09-07T07:27:26.2177616Z 2025-09-07T07:27:26.2177839Z Running inductor/test_minifier_utils 1/1 ... [2025-09-07 07:27:26.217119] 2025-09-07T07:27:26.2178248Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:26.2179215Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_minifier_utils.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:26.217519] 2025-09-07T07:27:27.1898260Z 2025-09-07T07:27:27.1899512Z export/test_tree_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tree_utils_1.1_b3a2fcd402a56c6d_.log 2025-09-07T07:27:27.1901408Z Running 2 items in this shard: test/export/test_tree_utils.py::TestTreeUtils::test_equivalence_check, test/export/test_tree_utils.py::TestTreeUtils::test_reorder_kwargs 2025-09-07T07:27:27.1902358Z 2025-09-07T07:27:27.1902556Z Running test_typing 1/1 ... [2025-09-07 07:27:27.190015] 2025-09-07T07:27:27.1903274Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:27.1906513Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_typing.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:27.190443] 2025-09-07T07:27:28.6107526Z 2025-09-07T07:27:28.6108751Z inductor/test_move_constructors_to_cuda 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_move_constructors_to_cuda_1.1_be653c4e2535579b_.log 2025-09-07T07:27:28.6112619Z Running 7 items in this shard: test/inductor/test_move_constructors_to_cuda.py::TestMoveConstructorsToCuda::test_multi_gpu, test/inductor/test_move_constructors_to_cuda.py::TestMoveConstructorsToCuda::test_multiple_constructors, test/inductor/test_move_constructors_to_cuda.py::TestMoveConstructorsToCuda::test_no_gpu, test/inductor/test_move_constructors_to_cuda.py::TestMoveConstructorsToCuda::test_non_convertable_op_failure, test/inductor/test_move_constructors_to_cuda.py::TestMoveConstructorsToCuda::test_output_failure, test/inductor/test_move_constructors_to_cuda.py::TestMoveConstructorsToCuda::test_sets_equiv, test/inductor/test_move_constructors_to_cuda.py::TestMoveConstructorsToCuda::test_simple 2025-09-07T07:27:28.6115628Z 2025-09-07T07:27:28.6115928Z Running functorch/test_aot_joint_with_descriptors 1/1 ... [2025-09-07 07:27:28.610921] 2025-09-07T07:27:28.6116465Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:28.6117718Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_aot_joint_with_descriptors.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:28.611280] 2025-09-07T07:27:28.6195442Z 2025-09-07T07:27:28.6196065Z dynamo/test_frame_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_frame_init_1.1_8edde8629f30fc88_.log 2025-09-07T07:27:28.6197098Z Running 1 items in this shard: test/dynamo/test_frame_init.py::FrameInitTests::test_frame_init 2025-09-07T07:27:28.6197777Z 2025-09-07T07:27:28.6199857Z Running test_utils_filelock 1/1 ... [2025-09-07 07:27:28.619834] 2025-09-07T07:27:28.6200330Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:28.6203831Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_utils_filelock.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:28.620184] 2025-09-07T07:27:29.2188235Z 2025-09-07T07:27:29.2189201Z torch_np/test_dtype 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_dtype_1.1_9ffcaefd28c21c50_.log 2025-09-07T07:27:29.2204330Z Running 44 items in this shard: test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'bool_', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'complex128', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'complex64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_bool, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'bool_', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'complex128', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'complex64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.bool_, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.complex128, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.complex64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.dtype('bool'), test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int8, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint8 2025-09-07T07:27:29.2216122Z 2025-09-07T07:27:29.2216335Z Running inductor/test_torchinductor 1/1 ... [2025-09-07 07:27:29.218979] 2025-09-07T07:27:29.2216723Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:29.2217675Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:29.219321] 2025-09-07T07:27:30.1876767Z 2025-09-07T07:27:30.1877550Z inductor/test_minifier_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_minifier_utils_1.1_313b7ab9495690d4_.log 2025-09-07T07:27:30.1879458Z Running 3 items in this shard: test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_convert_module_to_string, test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_invalid_output, test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_non_exportable 2025-09-07T07:27:30.1880705Z 2025-09-07T07:27:30.1880934Z Running inductor/test_metrics 1/1 ... [2025-09-07 07:27:30.187930] 2025-09-07T07:27:30.1881405Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:30.1884759Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_metrics.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:30.188288] 2025-09-07T07:27:32.1985497Z 2025-09-07T07:27:32.1986801Z test_transformers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_transformers_1.1_910aacadd771b421_.log 2025-09-07T07:27:33.0093930Z Running 12244 items in this shard: test/test_transformers.py::TestTransformersCUDA::test_bias_is_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_decoder_only_layer_cuda, test/test_transformers.py::TestTransformersCUDA::test_decoder_padding_and_src_mask_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_disable_fastpath_cuda, test/test_transformers.py::TestTransformersCUDA::test_encoder_is_causal_cuda, test/test_transformers.py::TestTransformersCUDA::test_encoder_padding_and_src_mask_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_is_causal_gpu_cuda, test/test_transformers.py::TestTransformersCUDA::test_kpm_mask_trailing_column_with_nested_tensor_cuda, test/test_transformers.py::TestTransformersCUDA::test_mask_check_fastpath_cuda, test/test_transformers.py::TestTransformersCUDA::test_math_backend_high_precision_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_script_encoder_subclass_cuda, test/test_transformers.py::TestTransformersCUDA::test_script_mha_in_proj_weight_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_self_attn_TxT_attn_mask_cuda, test/test_transformers.py::TestTransformersCUDA::test_train_with_is_causal_cuda, test/test_transformers.py::TestTransformersCUDA::test_train_with_pad_and_catch_error_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformer_bias_is_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_3_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_4_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_1_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_4_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_8_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_model_cuda, test/test_transformers.py::TestTransformersCUDA::test_with_nested_tensor_input_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_dispatch_fails_no_backend_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_atteention_large_bf16_nan_values_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_attention_fail_with_non_square_causal_attention_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_bfloat16_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_float16_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_fail_fp32_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_error_cases_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_requires_grad_failure_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_seq_len_0_inputs_fused_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_seq_len_0_inputs_fused_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_attn_mask_present_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_broadcast_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_broadcast_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_dim_3_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_dim_3_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_head_dim_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_head_dim_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_invalid_dtype_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_invalid_dtype_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_last_dim_stride_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_last_dim_stride_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sdpa_kernel_grouped_query_attention_cuda_fused_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sequence_lengths_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sequence_lengths_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_error_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_large_seq_len_uniform_attention_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_efficient_fail_bfloat16_less_than_sm80_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_nested_fails_on_padding_head_dim_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_unaligned_tensors_cuda, test/test_transformers.py::TestSDPACUDA::test_scaled_dot_product_attention_math_with_negative_scale_kernel0_cuda, test/test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_False_cuda, test/test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_d256_heuristic_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_different_dk_dv_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_fail_d128_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_gqa_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_nonmodulo64seqlen_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_preserves_query_layout_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_trivial_output_transpose_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_different_dk_dv_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel1_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel1_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_query_dense_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_seq_len_1_inputs_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_seq_len_1_inputs_fused_kernel1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_dense_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_nested_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float32_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contig_mask_bug_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float32_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_backwards_determinism_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_2_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_3_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_4_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_cudnn_nested_type_nested_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_dense_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_dense_fused_kernel1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_nested_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_nested_fused_kernel1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_singelton_head_dim_stride_ne_1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_and_mask_fails_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape3_cuda 2025-09-07T07:27:33.7878588Z 2025-09-07T07:27:33.7878881Z Running inductor/test_coordinate_descent_tuner 1/1 ... [2025-09-07 07:27:32.228955] 2025-09-07T07:27:33.7879346Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:33.7880335Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_coordinate_descent_tuner.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:32.229338] 2025-09-07T07:27:33.7881267Z 2025-09-07T07:27:33.7881689Z test_utils_filelock 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_utils_filelock_1.1_05a578f17c033f80_.log 2025-09-07T07:27:33.7882885Z Running 2 items in this shard: test/test_utils_filelock.py::TestFileLock::test_no_crash, test/test_utils_filelock.py::TestFileLock::test_sequencing 2025-09-07T07:27:33.7883400Z 2025-09-07T07:27:33.7883633Z Running inductor/test_foreach 1/1 ... [2025-09-07 07:27:32.540297] 2025-09-07T07:27:33.7884023Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:33.7884940Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_foreach.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:32.540623] 2025-09-07T07:27:33.7885722Z 2025-09-07T07:27:33.7886301Z functorch/test_aot_joint_with_descriptors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_aot_joint_with_descriptors_1.1_0c212763e8de3899_.log 2025-09-07T07:27:33.7890576Z Running 10 items in this shard: test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_conv_bn_module, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_export_and_compile, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_conv_bn_module, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_multiple_outputs, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_node_consistency, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_simple_linear, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_in_out_specs, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_module_with_kwargs, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_multiple_outputs_module, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_simple_linear_module 2025-09-07T07:27:33.7894456Z 2025-09-07T07:27:33.7894655Z Running backends/xeon/test_launch 1/1 ... [2025-09-07 07:27:32.681980] 2025-09-07T07:27:33.7895041Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:33.7896083Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'backends/xeon/test_launch.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:32.682352] 2025-09-07T07:27:33.7896886Z 2025-09-07T07:27:33.7897341Z inductor/test_indexing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_indexing_1.1_138aa76e8817fae9_.log 2025-09-07T07:27:33.7903578Z Running 21 items in this shard: test/inductor/test_indexing.py::TestIndexingSimplification::test_expand_floor_div_applied, test/inductor/test_indexing.py::TestIndexingSimplification::test_expand_floor_div_skipped, test/inductor/test_indexing.py::TestIndexingSimplification::test_indexing_join, test/inductor/test_indexing.py::TestIndexingSimplification::test_indexing_simplification, test/inductor/test_indexing.py::TestIndexingSimplification::test_int8_unpack, test/inductor/test_indexing.py::TestIndexingSimplification::test_modular_indexing_pairs_merged, test/inductor/test_indexing.py::TestIndexingSimplification::test_modular_indexing_pairs_not_merged, test/inductor/test_indexing.py::TestIndexingSimplification::test_modular_indexing_positive, test/inductor/test_indexing.py::ExprPrinterTests::test_print_Min_Max, test/inductor/test_indexing.py::ExprPrinterTests::test_print_ceil, test/inductor/test_indexing.py::ExprPrinterTests::test_print_floor, test/inductor/test_indexing.py::ExprPrinterTests::test_print_floor_div, test/inductor/test_indexing.py::ExprPrinterTests::test_print_integer, test/inductor/test_indexing.py::ExprPrinterTests::test_print_mod, test/inductor/test_indexing.py::ExprPrinterTests::test_print_mod_index, test/inductor/test_indexing.py::ExprPrinterTests::test_print_pow, test/inductor/test_indexing.py::ExprPrinterTests::test_print_python_mod, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round_decimal_ndigits_-1, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round_decimal_ndigits_0, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round_decimal_ndigits_1 2025-09-07T07:27:33.7909540Z 2025-09-07T07:27:33.7909728Z Running dynamo/test_functions 1/1 ... [2025-09-07 07:27:33.392087] 2025-09-07T07:27:33.7910094Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:33.7911000Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_functions.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:33.392390] 2025-09-07T07:27:37.9042110Z 2025-09-07T07:27:37.9043532Z backends/xeon/test_launch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/backends.xeon.test_launch_1.1_9584f35451753f31_.log 2025-09-07T07:27:37.9045573Z Running 2 items in this shard: test/backends/xeon/test_launch.py::TestTorchrun::test_cpu_info, test/backends/xeon/test_launch.py::TestTorchrun::test_multi_threads 2025-09-07T07:27:37.9046591Z 2025-09-07T07:27:37.9046982Z Running inductor/test_torchinductor_opinfo 1/12 ... [2025-09-07 07:27:37.904346] 2025-09-07T07:27:37.9047684Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:37.9049763Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'not serial', '--shard-id=1', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:37.904752] 2025-09-07T07:27:38.0635704Z 2025-09-07T07:27:38.0636410Z inductor/test_metrics 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_metrics_1.1_f78a252b8c96abf7_.log 2025-09-07T07:27:38.0639980Z Running 6 items in this shard: test/inductor/test_metrics.py::TestMetrics::test_atomic_add, test/inductor/test_metrics.py::TestMetrics::test_count_args, test/inductor/test_metrics.py::TestMetrics::test_count_pattern, test/inductor/test_metrics.py::TestMetrics::test_kernel_args_num_gb, test/inductor/test_metrics.py::TestMetrics::test_parse_proper_kernel_fn_code, test/inductor/test_metrics.py::TestMetrics::test_parse_reduction_hint 2025-09-07T07:27:38.0642480Z 2025-09-07T07:27:38.0642888Z Running inductor/test_torchinductor_opinfo 4/12 ... [2025-09-07 07:27:38.063805] 2025-09-07T07:27:38.0643600Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:38.0645291Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'not serial', '--shard-id=4', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:38.064161] 2025-09-07T07:27:40.1047800Z 2025-09-07T07:27:40.1049938Z inductor/test_coordinate_descent_tuner 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_coordinate_descent_tuner_1.1_3ec3389ecf26b7c1_.log 2025-09-07T07:27:40.1054588Z Running 5 items in this shard: test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_abs_function, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_get_neighbour_values, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_no_neighbors, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_persistent_reduction, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_value_too_large 2025-09-07T07:27:40.1057277Z 2025-09-07T07:27:40.1057532Z Running inductor/test_torchinductor_opinfo 5/12 ... [2025-09-07 07:27:40.104841] 2025-09-07T07:27:40.1058133Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:40.1059172Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'not serial', '--shard-id=5', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:40.105202] 2025-09-07T07:27:42.4196553Z 2025-09-07T07:27:42.4197416Z dynamo/test_functions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_functions_1.1_8c0c7ca01be5b38b_.log 2025-09-07T07:27:42.4310752Z Running 469 items in this shard: test/dynamo/test_functions.py::FunctionTests::test_T, test/dynamo/test_functions.py::FunctionTests::test_add, test/dynamo/test_functions.py::FunctionTests::test_add_, test/dynamo/test_functions.py::FunctionTests::test_addcdiv, test/dynamo/test_functions.py::FunctionTests::test_addcdiv_, test/dynamo/test_functions.py::FunctionTests::test_addcmul_, test/dynamo/test_functions.py::FunctionTests::test_are_functorch_transforms_active, test/dynamo/test_functions.py::FunctionTests::test_attrgetter, test/dynamo/test_functions.py::FunctionTests::test_broadcast_foreach_pow, test/dynamo/test_functions.py::FunctionTests::test_build_list_unpack, test/dynamo/test_functions.py::FunctionTests::test_call_dict1, test/dynamo/test_functions.py::FunctionTests::test_call_dict2, test/dynamo/test_functions.py::FunctionTests::test_call_dict3, test/dynamo/test_functions.py::FunctionTests::test_call_dict4, test/dynamo/test_functions.py::FunctionTests::test_call_dict5, test/dynamo/test_functions.py::FunctionTests::test_callable_builtin, test/dynamo/test_functions.py::FunctionTests::test_callable_class, test/dynamo/test_functions.py::FunctionTests::test_callable_lambda, test/dynamo/test_functions.py::FunctionTests::test_callable_list, test/dynamo/test_functions.py::FunctionTests::test_callable_torch, test/dynamo/test_functions.py::FunctionTests::test_chunks1, test/dynamo/test_functions.py::FunctionTests::test_class_dict, test/dynamo/test_functions.py::FunctionTests::test_cls_eq, test/dynamo/test_functions.py::FunctionTests::test_cls_hasattr, test/dynamo/test_functions.py::FunctionTests::test_cls_is, test/dynamo/test_functions.py::FunctionTests::test_compare_constant_and_tensor, test/dynamo/test_functions.py::FunctionTests::test_complex_closure, test/dynamo/test_functions.py::FunctionTests::test_const_tuple_add1, test/dynamo/test_functions.py::FunctionTests::test_const_tuple_add2, test/dynamo/test_functions.py::FunctionTests::test_constant1, test/dynamo/test_functions.py::FunctionTests::test_constant2, test/dynamo/test_functions.py::FunctionTests::test_constant3, test/dynamo/test_functions.py::FunctionTests::test_constant4, test/dynamo/test_functions.py::FunctionTests::test_constant_set, test/dynamo/test_functions.py::FunctionTests::test_context_wrapping_nested_functions_no_closure, test/dynamo/test_functions.py::FunctionTests::test_cublas_allow_tf32, test/dynamo/test_functions.py::FunctionTests::test_custom_dict_kwargs, test/dynamo/test_functions.py::FunctionTests::test_default_dict_closure, test/dynamo/test_functions.py::FunctionTests::test_default_dict_constr, test/dynamo/test_functions.py::FunctionTests::test_default_dict_dict, test/dynamo/test_functions.py::FunctionTests::test_default_dict_lambda, test/dynamo/test_functions.py::FunctionTests::test_default_dict_list, test/dynamo/test_functions.py::FunctionTests::test_default_dict_set, test/dynamo/test_functions.py::FunctionTests::test_default_dict_tuple, test/dynamo/test_functions.py::FunctionTests::test_defaultdict_setdefault1, test/dynamo/test_functions.py::FunctionTests::test_defaultdict_setdefault2, test/dynamo/test_functions.py::FunctionTests::test_defaultdict_setdefault3, test/dynamo/test_functions.py::FunctionTests::test_del, test/dynamo/test_functions.py::FunctionTests::test_deque, test/dynamo/test_functions.py::FunctionTests::test_device, test/dynamo/test_functions.py::FunctionTests::test_device_constant, test/dynamo/test_functions.py::FunctionTests::test_dict_copy, test/dynamo/test_functions.py::FunctionTests::test_dict_fromkeys, test/dynamo/test_functions.py::FunctionTests::test_dict_hasattr, test/dynamo/test_functions.py::FunctionTests::test_dict_id_guard, test/dynamo/test_functions.py::FunctionTests::test_dict_items_sorted, test/dynamo/test_functions.py::FunctionTests::test_dict_key_set1, test/dynamo/test_functions.py::FunctionTests::test_dict_key_set2, test/dynamo/test_functions.py::FunctionTests::test_dict_key_set3, test/dynamo/test_functions.py::FunctionTests::test_dict_keys, test/dynamo/test_functions.py::FunctionTests::test_dict_kwargs, test/dynamo/test_functions.py::FunctionTests::test_dict_mutable_map, test/dynamo/test_functions.py::FunctionTests::test_dict_ops, test/dynamo/test_functions.py::FunctionTests::test_dict_param_keys, test/dynamo/test_functions.py::FunctionTests::test_dict_setdefault1, test/dynamo/test_functions.py::FunctionTests::test_dict_setdefault2, test/dynamo/test_functions.py::FunctionTests::test_dict_setdefault3, test/dynamo/test_functions.py::FunctionTests::test_dict_sorted, test/dynamo/test_functions.py::FunctionTests::test_dict_tuple_lazy_guard, test/dynamo/test_functions.py::FunctionTests::test_dict_update, test/dynamo/test_functions.py::FunctionTests::test_dict_update_kwargs, test/dynamo/test_functions.py::FunctionTests::test_dict_values, test/dynamo/test_functions.py::FunctionTests::test_distributed_is_available, test/dynamo/test_functions.py::FunctionTests::test_distributed_is_initialized, test/dynamo/test_functions.py::FunctionTests::test_dtype, test/dynamo/test_functions.py::FunctionTests::test_dtype_compare, test/dynamo/test_functions.py::FunctionTests::test_elipsis, test/dynamo/test_functions.py::FunctionTests::test_enumerate, test/dynamo/test_functions.py::FunctionTests::test_enumerate_custom, test/dynamo/test_functions.py::FunctionTests::test_enumerate_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_filter, test/dynamo/test_functions.py::FunctionTests::test_filter_fallback, test/dynamo/test_functions.py::FunctionTests::test_filter_graph_break_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_filter_infinite_iterator, test/dynamo/test_functions.py::FunctionTests::test_filter_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_filter_with_graph_break, test/dynamo/test_functions.py::FunctionTests::test_finfo, test/dynamo/test_functions.py::FunctionTests::test_flat_param_same_storage_size, test/dynamo/test_functions.py::FunctionTests::test_float, test/dynamo/test_functions.py::FunctionTests::test_fn_with_self_set, test/dynamo/test_functions.py::FunctionTests::test_foreach_lerp_, test/dynamo/test_functions.py::FunctionTests::test_fstrings1, test/dynamo/test_functions.py::FunctionTests::test_fstrings2, test/dynamo/test_functions.py::FunctionTests::test_fstrings3, test/dynamo/test_functions.py::FunctionTests::test_fstrings4, test/dynamo/test_functions.py::FunctionTests::test_fstrings5, test/dynamo/test_functions.py::FunctionTests::test_fstrings6, test/dynamo/test_functions.py::FunctionTests::test_funcdef_closure, test/dynamo/test_functions.py::FunctionTests::test_functools_cache_guard, test/dynamo/test_functions.py::FunctionTests::test_functools_partial, test/dynamo/test_functions.py::FunctionTests::test_functools_partial_binding, test/dynamo/test_functions.py::FunctionTests::test_generic_namedtuple_hasattr, test/dynamo/test_functions.py::FunctionTests::test_generic_namedtuple_subclass, test/dynamo/test_functions.py::FunctionTests::test_generic_namedtuple_user_methods, test/dynamo/test_functions.py::FunctionTests::test_get_autocast_gpu_dtype, test/dynamo/test_functions.py::FunctionTests::test_get_calculate_correct_fan, test/dynamo/test_functions.py::FunctionTests::test_get_default_dtype, test/dynamo/test_functions.py::FunctionTests::test_get_device_properties_tensor_device, test/dynamo/test_functions.py::FunctionTests::test_get_privateuse1_name, test/dynamo/test_functions.py::FunctionTests::test_getattr, test/dynamo/test_functions.py::FunctionTests::test_getattr_metaclass, test/dynamo/test_functions.py::FunctionTests::test_globalfn, test/dynamo/test_functions.py::FunctionTests::test_globalmodule, test/dynamo/test_functions.py::FunctionTests::test_globalvar, test/dynamo/test_functions.py::FunctionTests::test_import1, test/dynamo/test_functions.py::FunctionTests::test_in_not_in, test/dynamo/test_functions.py::FunctionTests::test_index, test/dynamo/test_functions.py::FunctionTests::test_indexed_range, test/dynamo/test_functions.py::FunctionTests::test_indirect1, test/dynamo/test_functions.py::FunctionTests::test_indirect2, test/dynamo/test_functions.py::FunctionTests::test_indirect3, test/dynamo/test_functions.py::FunctionTests::test_inline_jit__unwrap_optional, test/dynamo/test_functions.py::FunctionTests::test_inline_jit_annotations, test/dynamo/test_functions.py::FunctionTests::test_inline_lru_cache_fn_with_default_args, test/dynamo/test_functions.py::FunctionTests::test_inline_script_if_tracing_fn_with_default_args, test/dynamo/test_functions.py::FunctionTests::test_inline_softmax, test/dynamo/test_functions.py::FunctionTests::test_inline_with_default, test/dynamo/test_functions.py::FunctionTests::test_inner_function, test/dynamo/test_functions.py::FunctionTests::test_is, test/dynamo/test_functions.py::FunctionTests::test_is_any_autocast_enabled, test/dynamo/test_functions.py::FunctionTests::test_is_checkpoint_valid, test/dynamo/test_functions.py::FunctionTests::test_is_complex, test/dynamo/test_functions.py::FunctionTests::test_is_contiguous_frame_counts, test/dynamo/test_functions.py::FunctionTests::test_is_contiguous_memory_format, test/dynamo/test_functions.py::FunctionTests::test_is_floating_point, test/dynamo/test_functions.py::FunctionTests::test_is_fx_tracing, test/dynamo/test_functions.py::FunctionTests::test_is_in_onnx_export, test/dynamo/test_functions.py::FunctionTests::test_is_inference_mode_global_recompilation, test/dynamo/test_functions.py::FunctionTests::test_is_inference_recompilation, test/dynamo/test_functions.py::FunctionTests::test_is_integer, test/dynamo/test_functions.py::FunctionTests::test_is_not, test/dynamo/test_functions.py::FunctionTests::test_is_not_null, test/dynamo/test_functions.py::FunctionTests::test_is_quantized, test/dynamo/test_functions.py::FunctionTests::test_is_sparse, test/dynamo/test_functions.py::FunctionTests::test_isinstance, test/dynamo/test_functions.py::FunctionTests::test_islice_chain, test/dynamo/test_functions.py::FunctionTests::test_itemgetter, test/dynamo/test_functions.py::FunctionTests::test_itertools_chain, test/dynamo/test_functions.py::FunctionTests::test_itertools_chain_from_iterable, test/dynamo/test_functions.py::FunctionTests::test_itertools_combinations, test/dynamo/test_functions.py::FunctionTests::test_itertools_compress, test/dynamo/test_functions.py::FunctionTests::test_itertools_compress_tensors, test/dynamo/test_functions.py::FunctionTests::test_itertools_filterfalse_basic, test/dynamo/test_functions.py::FunctionTests::test_itertools_pairwise, test/dynamo/test_functions.py::FunctionTests::test_itertools_permutations_args, test/dynamo/test_functions.py::FunctionTests::test_itertools_permutations_basic, test/dynamo/test_functions.py::FunctionTests::test_itertools_permutations_various_iterators, test/dynamo/test_functions.py::FunctionTests::test_itertools_product, test/dynamo/test_functions.py::FunctionTests::test_itertools_product_args, test/dynamo/test_functions.py::FunctionTests::test_itertools_product_various_iterators, test/dynamo/test_functions.py::FunctionTests::test_itertools_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_jit_annotate, test/dynamo/test_functions.py::FunctionTests::test_len_constant_dict, test/dynamo/test_functions.py::FunctionTests::test_len_constant_list, test/dynamo/test_functions.py::FunctionTests::test_len_constant_misc_iterables, test/dynamo/test_functions.py::FunctionTests::test_len_tensor, test/dynamo/test_functions.py::FunctionTests::test_list_add, test/dynamo/test_functions.py::FunctionTests::test_list_add_then_mutate, test/dynamo/test_functions.py::FunctionTests::test_list_clear, test/dynamo/test_functions.py::FunctionTests::test_list_compare_polyfill, test/dynamo/test_functions.py::FunctionTests::test_list_compare_polyfill_non_lists, test/dynamo/test_functions.py::FunctionTests::test_list_convert, test/dynamo/test_functions.py::FunctionTests::test_list_expand_lhs, test/dynamo/test_functions.py::FunctionTests::test_list_index_with_constant_tensor, test/dynamo/test_functions.py::FunctionTests::test_list_reversed, test/dynamo/test_functions.py::FunctionTests::test_list_setitem, test/dynamo/test_functions.py::FunctionTests::test_list_setitem_slice, test/dynamo/test_functions.py::FunctionTests::test_list_slice, test/dynamo/test_functions.py::FunctionTests::test_list_slice_assignment, test/dynamo/test_functions.py::FunctionTests::test_list_sorted1, test/dynamo/test_functions.py::FunctionTests::test_list_sorted2, test/dynamo/test_functions.py::FunctionTests::test_list_truth, test/dynamo/test_functions.py::FunctionTests::test_listarg1, test/dynamo/test_functions.py::FunctionTests::test_listarg2, test/dynamo/test_functions.py::FunctionTests::test_listarg3, test/dynamo/test_functions.py::FunctionTests::test_listarg4, test/dynamo/test_functions.py::FunctionTests::test_listarg5, test/dynamo/test_functions.py::FunctionTests::test_load_global_bool, test/dynamo/test_functions.py::FunctionTests::test_lru_cache_warning_issued_during_tracing, test/dynamo/test_functions.py::FunctionTests::test_mT, test/dynamo/test_functions.py::FunctionTests::test_manual_seed, test/dynamo/test_functions.py::FunctionTests::test_map_call_function_ex, test/dynamo/test_functions.py::FunctionTests::test_map_deque_extendleft, test/dynamo/test_functions.py::FunctionTests::test_map_dict_fromkeys, test/dynamo/test_functions.py::FunctionTests::test_map_enumerate, test/dynamo/test_functions.py::FunctionTests::test_map_infinite, test/dynamo/test_functions.py::FunctionTests::test_map_iter, test/dynamo/test_functions.py::FunctionTests::test_map_list, test/dynamo/test_functions.py::FunctionTests::test_map_list_extend, test/dynamo/test_functions.py::FunctionTests::test_map_list_slice_assign, test/dynamo/test_functions.py::FunctionTests::test_map_max, test/dynamo/test_functions.py::FunctionTests::test_map_max_const, test/dynamo/test_functions.py::FunctionTests::test_map_partial_unpack, test/dynamo/test_functions.py::FunctionTests::test_map_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_map_reduce, test/dynamo/test_functions.py::FunctionTests::test_map_return, test/dynamo/test_functions.py::FunctionTests::test_map_set, test/dynamo/test_functions.py::FunctionTests::test_map_sorted, test/dynamo/test_functions.py::FunctionTests::test_map_str_join, test/dynamo/test_functions.py::FunctionTests::test_map_sum, test/dynamo/test_functions.py::FunctionTests::test_map_tuple, test/dynamo/test_functions.py::FunctionTests::test_map_unpack_twice, test/dynamo/test_functions.py::FunctionTests::test_map_unpack_vars, test/dynamo/test_functions.py::FunctionTests::test_map_with_graph_break, test/dynamo/test_functions.py::FunctionTests::test_map_zip_dict, test/dynamo/test_functions.py::FunctionTests::test_math_radians, test/dynamo/test_functions.py::FunctionTests::test_mean_sum_np, test/dynamo/test_functions.py::FunctionTests::test_methodcall1, test/dynamo/test_functions.py::FunctionTests::test_methodcall2, test/dynamo/test_functions.py::FunctionTests::test_methodcall3, test/dynamo/test_functions.py::FunctionTests::test_methodcaller, test/dynamo/test_functions.py::FunctionTests::test_min_max, test/dynamo/test_functions.py::FunctionTests::test_module_constant, test/dynamo/test_functions.py::FunctionTests::test_namedtuple, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_defaults, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_fields, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_hasattr, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_replace, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_subclass, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_user_methods, test/dynamo/test_functions.py::FunctionTests::test_ndarray_builtin_functions, test/dynamo/test_functions.py::FunctionTests::test_ndarray_method, test/dynamo/test_functions.py::FunctionTests::test_ndarray_methods_returning_scalar, test/dynamo/test_functions.py::FunctionTests::test_ndarray_reshape, test/dynamo/test_functions.py::FunctionTests::test_ndarray_transpose, test/dynamo/test_functions.py::FunctionTests::test_ndim, test/dynamo/test_functions.py::FunctionTests::test_no_recompile_inner_function, test/dynamo/test_functions.py::FunctionTests::test_no_recompile_inner_lambda, test/dynamo/test_functions.py::FunctionTests::test_non_inlined_closure, test/dynamo/test_functions.py::FunctionTests::test_not_list, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_as_input_int_or_float_float, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_as_input_int_or_float_int, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_guards_float, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_guards_int, test/dynamo/test_functions.py::FunctionTests::test_np_finfo, test/dynamo/test_functions.py::FunctionTests::test_np_iinfo, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_as_integer_ratio_num_type0, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_as_integer_ratio_num_type3, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_bit_length_num_type1, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_conjugate_num_type2, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_conjugate_num_type4, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_hex_num_type5, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_is_integer_num_type6, test/dynamo/test_functions.py::FunctionTests::test_numpy_attributes, test/dynamo/test_functions.py::FunctionTests::test_numpy_dtype_argument_to_function, test/dynamo/test_functions.py::FunctionTests::test_numpy_dtype_call_in_function, test/dynamo/test_functions.py::FunctionTests::test_numpy_fft, test/dynamo/test_functions.py::FunctionTests::test_numpy_linalg, test/dynamo/test_functions.py::FunctionTests::test_numpy_meshgrid, test/dynamo/test_functions.py::FunctionTests::test_numpy_random, test/dynamo/test_functions.py::FunctionTests::test_numpy_size, test/dynamo/test_functions.py::FunctionTests::test_obj_eq, test/dynamo/test_functions.py::FunctionTests::test_obj_is, test/dynamo/test_functions.py::FunctionTests::test_ordered_dict_kwargs, test/dynamo/test_functions.py::FunctionTests::test_partial_across_graph_break_uninvoked, test/dynamo/test_functions.py::FunctionTests::test_partials_as_input_UDF, test/dynamo/test_functions.py::FunctionTests::test_partials_as_input_partials_lambda, test/dynamo/test_functions.py::FunctionTests::test_partials_as_input_partials_mod, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct_args_and_kwargs, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct_mix, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct_mix_no_source, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___annotations__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___builtins__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___call__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___class__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___closure__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___code__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___defaults__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___delattr__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___dict__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___dir__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___doc__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___eq__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___format__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___ge__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___get__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___getattribute__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___globals__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___gt__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___hash__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___init__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___init_subclass__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___kwdefaults__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___le__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___lt__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___module__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___name__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___ne__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___new__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___qualname__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___reduce__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___reduce_ex__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___repr__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___setattr__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___sizeof__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___str__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___subclasshook__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr_args, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr_func, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr_keywords, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_set_attr, test/dynamo/test_functions.py::FunctionTests::test_partials_lambda, test/dynamo/test_functions.py::FunctionTests::test_partials_recompilation, test/dynamo/test_functions.py::FunctionTests::test_partials_torch_op_arg, test/dynamo/test_functions.py::FunctionTests::test_partials_torch_op_kwarg, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_arg, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_kwarg, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_kwarg_method, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_kwarg_module, test/dynamo/test_functions.py::FunctionTests::test_pop, test/dynamo/test_functions.py::FunctionTests::test_pos, test/dynamo/test_functions.py::FunctionTests::test_pow_int, test/dynamo/test_functions.py::FunctionTests::test_promote_types, test/dynamo/test_functions.py::FunctionTests::test_rand_inlined, test/dynamo/test_functions.py::FunctionTests::test_rand_tensor_partial, test/dynamo/test_functions.py::FunctionTests::test_range1, test/dynamo/test_functions.py::FunctionTests::test_range2, test/dynamo/test_functions.py::FunctionTests::test_range_iterator, test/dynamo/test_functions.py::FunctionTests::test_range_iterator_2, test/dynamo/test_functions.py::FunctionTests::test_range_iterator_graph_break, test/dynamo/test_functions.py::FunctionTests::test_range_iterator_graph_break_2, test/dynamo/test_functions.py::FunctionTests::test_range_length, test/dynamo/test_functions.py::FunctionTests::test_range_with_index, test/dynamo/test_functions.py::FunctionTests::test_range_with_slice_index, test/dynamo/test_functions.py::FunctionTests::test_reduce, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_initial, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_none_initial, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_single, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_single_with_initial, test/dynamo/test_functions.py::FunctionTests::test_return_dict, test/dynamo/test_functions.py::FunctionTests::test_return_dict2, test/dynamo/test_functions.py::FunctionTests::test_return_multiple_numpy_ndarray, test/dynamo/test_functions.py::FunctionTests::test_return_numpy_ndarray, test/dynamo/test_functions.py::FunctionTests::test_return_tuple1, test/dynamo/test_functions.py::FunctionTests::test_return_tuple2, test/dynamo/test_functions.py::FunctionTests::test_returning_recursive_func, test/dynamo/test_functions.py::FunctionTests::test_round, test/dynamo/test_functions.py::FunctionTests::test_set_add, test/dynamo/test_functions.py::FunctionTests::test_set_in_frozenset, test/dynamo/test_functions.py::FunctionTests::test_set_keys_view, test/dynamo/test_functions.py::FunctionTests::test_set_update_bytecode, test/dynamo/test_functions.py::FunctionTests::test_set_update_list_with_duplicated_items, test/dynamo/test_functions.py::FunctionTests::test_shape1, test/dynamo/test_functions.py::FunctionTests::test_shape2, test/dynamo/test_functions.py::FunctionTests::test_size_tuple_add, test/dynamo/test_functions.py::FunctionTests::test_slice1, test/dynamo/test_functions.py::FunctionTests::test_slice2, test/dynamo/test_functions.py::FunctionTests::test_slice3, test/dynamo/test_functions.py::FunctionTests::test_slice4, test/dynamo/test_functions.py::FunctionTests::test_slice5, test/dynamo/test_functions.py::FunctionTests::test_slice6, test/dynamo/test_functions.py::FunctionTests::test_slice_eq, test/dynamo/test_functions.py::FunctionTests::test_sliced_range, test/dynamo/test_functions.py::FunctionTests::test_sorted_const_key_non_const_items, test/dynamo/test_functions.py::FunctionTests::test_sourceless_build_method_type, test/dynamo/test_functions.py::FunctionTests::test_startswith, test/dynamo/test_functions.py::FunctionTests::test_sum, test/dynamo/test_functions.py::FunctionTests::test_sum_shortcut, test/dynamo/test_functions.py::FunctionTests::test_sum_shortcut_with_start_arg, test/dynamo/test_functions.py::FunctionTests::test_sum_shortcut_with_start_kwarg, test/dynamo/test_functions.py::FunctionTests::test_sum_with_start_arg, test/dynamo/test_functions.py::FunctionTests::test_sum_with_start_kwarg, test/dynamo/test_functions.py::FunctionTests::test_symbool_to_int, test/dynamo/test_functions.py::FunctionTests::test_tensor_dim, test/dynamo/test_functions.py::FunctionTests::test_tensor_element_size, test/dynamo/test_functions.py::FunctionTests::test_tensor_is_complex, test/dynamo/test_functions.py::FunctionTests::test_tensor_len, test/dynamo/test_functions.py::FunctionTests::test_tensor_new_with_shape, test/dynamo/test_functions.py::FunctionTests::test_tensor_new_with_size, test/dynamo/test_functions.py::FunctionTests::test_tensor_size, test/dynamo/test_functions.py::FunctionTests::test_tensor_size_indexed_by_symint, test/dynamo/test_functions.py::FunctionTests::test_tensor_type, test/dynamo/test_functions.py::FunctionTests::test_tensor_type2, test/dynamo/test_functions.py::FunctionTests::test_tensor_type3, test/dynamo/test_functions.py::FunctionTests::test_tensor_type4, test/dynamo/test_functions.py::FunctionTests::test_tensor_type5, test/dynamo/test_functions.py::FunctionTests::test_to, test/dynamo/test_functions.py::FunctionTests::test_torch_distributions_functions, test/dynamo/test_functions.py::FunctionTests::test_torch_from_numpy, test/dynamo/test_functions.py::FunctionTests::test_torch_get_device_module, test/dynamo/test_functions.py::FunctionTests::test_torch_size_as_dict_key, test/dynamo/test_functions.py::FunctionTests::test_torch_size_hasattr, test/dynamo/test_functions.py::FunctionTests::test_torch_source, test/dynamo/test_functions.py::FunctionTests::test_transpose_for_scores, test/dynamo/test_functions.py::FunctionTests::test_truth, test/dynamo/test_functions.py::FunctionTests::test_tuple1, test/dynamo/test_functions.py::FunctionTests::test_tuple2, test/dynamo/test_functions.py::FunctionTests::test_tuple_contains, test/dynamo/test_functions.py::FunctionTests::test_tuple_iadd, test/dynamo/test_functions.py::FunctionTests::test_tuple_map, test/dynamo/test_functions.py::FunctionTests::test_tuple_sorted, test/dynamo/test_functions.py::FunctionTests::test_two_point_iter, test/dynamo/test_functions.py::FunctionTests::test_unary_fold_op, test/dynamo/test_functions.py::FunctionTests::test_unary_fold_op_seq, test/dynamo/test_functions.py::FunctionTests::test_unpack1, test/dynamo/test_functions.py::FunctionTests::test_unpack2, test/dynamo/test_functions.py::FunctionTests::test_unpack3, test/dynamo/test_functions.py::FunctionTests::test_unpack_ex1, test/dynamo/test_functions.py::FunctionTests::test_unpack_ex2, test/dynamo/test_functions.py::FunctionTests::test_unpack_ex3, test/dynamo/test_functions.py::FunctionTests::test_unpack_mutable_map, test/dynamo/test_functions.py::FunctionTests::test_unsqueeze_inplace, test/dynamo/test_functions.py::FunctionTests::test_viamethod, test/dynamo/test_functions.py::FunctionTests::test_viatorch, test/dynamo/test_functions.py::FunctionTests::test_zip_longest, test/dynamo/test_functions.py::FunctionTests::test_zip_reconstruct, test/dynamo/test_functions.py::DefaultsTests::test_cast_tensor_single_elem, test/dynamo/test_functions.py::DefaultsTests::test_cuda_current_device, test/dynamo/test_functions.py::DefaultsTests::test_dataclass_factory, test/dynamo/test_functions.py::DefaultsTests::test_dataclass_nested, test/dynamo/test_functions.py::DefaultsTests::test_fn_with_attr, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_construction, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_illegal_call_method, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_reconstruction, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_copy, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_difference, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_intersection, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_symmetric_difference, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_union, test/dynamo/test_functions.py::DefaultsTests::test_func_attrs, test/dynamo/test_functions.py::DefaultsTests::test_func_default_tensor_args, test/dynamo/test_functions.py::DefaultsTests::test_func_default_torch_args, test/dynamo/test_functions.py::DefaultsTests::test_functional_compile, test/dynamo/test_functions.py::DefaultsTests::test_functools_partial_id, test/dynamo/test_functions.py::DefaultsTests::test_fx_immutable_list_mutation_not_allowed, test/dynamo/test_functions.py::DefaultsTests::test_fx_map_aggregate, test/dynamo/test_functions.py::DefaultsTests::test_in_set_inplace, test/dynamo/test_functions.py::DefaultsTests::test_in_set_would_fail_broadcast, test/dynamo/test_functions.py::DefaultsTests::test_inspect_method_source, test/dynamo/test_functions.py::DefaultsTests::test_is_init_in_compile_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_init_in_compile_vmapped_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_init_in_compile_vmapped_mutated_tensor_tensor_multi_arg, test/dynamo/test_functions.py::DefaultsTests::test_is_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_mutated_tensor_tensor_across_graph_break, test/dynamo/test_functions.py::DefaultsTests::test_is_not_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_vmapped_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_keyword, test/dynamo/test_functions.py::DefaultsTests::test_listlike_of_tensors_contains_constant, test/dynamo/test_functions.py::DefaultsTests::test_meth_default_tensor_args, test/dynamo/test_functions.py::DefaultsTests::test_pybind_object, test/dynamo/test_functions.py::DefaultsTests::test_reconstructed_name, test/dynamo/test_functions.py::DefaultsTests::test_set_call___init___frozenset, test/dynamo/test_functions.py::DefaultsTests::test_set_call___init___set, test/dynamo/test_functions.py::DefaultsTests::test_set_construction, test/dynamo/test_functions.py::DefaultsTests::test_skip_function_call_very_weird_value, test/dynamo/test_functions.py::DefaultsTests::test_str_handler_for_user_defined_object, test/dynamo/test_functions.py::DefaultsTests::test_sys_recursionlimit, test/dynamo/test_functions.py::DefaultsTests::test_tree_map, test/dynamo/test_functions.py::DefaultsTests::test_udf_list, test/dynamo/test_functions.py::DefaultsTests::test_udf_list_reconstruction, test/dynamo/test_functions.py::DefaultsTests::test_udf_list_slice, test/dynamo/test_functions.py::DefaultsTests::test_udf_namedtuple, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple_construction, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple_construction_custom_new, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple_reconstruction, test/dynamo/test_functions.py::DefaultsTests::test_zip_strict 2025-09-07T07:27:42.4417234Z 2025-09-07T07:27:42.4417465Z Running inductor/test_torchinductor_opinfo 8/12 ... [2025-09-07 07:27:42.420490] 2025-09-07T07:27:42.4417888Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:42.4418859Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'not serial', '--shard-id=8', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:42.420821] 2025-09-07T07:27:44.4713364Z 2025-09-07T07:27:44.4714882Z inductor/test_foreach 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_foreach_1.1_406f1110b9233d45_.log 2025-09-07T07:27:44.4874309Z Running 534 items in this shard: test/inductor/test_foreach.py::ForeachTests::test_2d_block_mixed_sizes_with_mask, test/inductor/test_foreach.py::ForeachTests::test_2d_block_no_mixed_sizes_no_mask, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_aliasing, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_decomp__foreach_addcdiv, test/inductor/test_foreach.py::ForeachTests::test_decomp__foreach_addcmul, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_enable_dynamic_shapes_cpp_wrapper_cuda, test/inductor/test_foreach.py::ForeachTests::test_enable_dynamic_shapes_python_wrapper, test/inductor/test_foreach.py::ForeachTests::test_foreach_cpp_wrapper_cuda, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_input_mutation, test/inductor/test_foreach.py::ForeachTests::test_fuse_concat, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_multi_device, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_add_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_div_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_mul_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_sub_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_add_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_div_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_mul_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_sub_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_add_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_div_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_mul_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_sub_, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_zero_elems 2025-09-07T07:27:44.5027518Z 2025-09-07T07:27:44.5027746Z Running inductor/test_torchinductor_opinfo 9/12 ... [2025-09-07 07:27:44.472130] 2025-09-07T07:27:44.5028164Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:27:44.5029244Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'not serial', '--shard-id=9', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:27:44.472456] 2025-09-07T07:28:43.9632970Z 2025-09-07T07:28:43.9633941Z test_typing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_typing_1.1_198cc9d2e1881298_.log 2025-09-07T07:28:43.9639771Z Running 18 items in this shard: test/test_typing.py::TestTyping::test_fail_arithmetic_ops.py, test/test_typing.py::TestTyping::test_fail_creation_ops.py, test/test_typing.py::TestTyping::test_fail_random.py, test/test_typing.py::TestTyping::test_fail_torch_size.py, test/test_typing.py::TestTyping::test_reveal_module_list.py, test/test_typing.py::TestTyping::test_reveal_namedtuple.py, test/test_typing.py::TestTyping::test_reveal_opt_size.py, test/test_typing.py::TestTyping::test_reveal_size.py, test/test_typing.py::TestTyping::test_reveal_tensor_constructors.py, test/test_typing.py::TestTyping::test_reveal_tensor_copy.py, test/test_typing.py::TestTyping::test_reveal_tensor_sampling.py, test/test_typing.py::TestTyping::test_reveal_torch_optim.py, test/test_typing.py::TestTyping::test_success_arithmetic_ops.py, test/test_typing.py::TestTyping::test_success_creation_ops.py, test/test_typing.py::TestTyping::test_success_cuda_steam.py, test/test_typing.py::TestTyping::test_success_distributions.py, test/test_typing.py::TestTyping::test_success_math_ops.py, test/test_typing.py::TestTyping::test_success_torch_size.py 2025-09-07T07:28:43.9645124Z 2025-09-07T07:28:43.9645429Z Running inductor/test_torchinductor_opinfo 12/12 ... [2025-09-07 07:28:43.963295] 2025-09-07T07:28:43.9646121Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:28:43.9647415Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'not serial', '--shard-id=12', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:28:43.963667] 2025-09-07T07:29:44.4624190Z 2025-09-07T07:29:44.4625322Z inductor/test_torchinductor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_1.1_f6bb796796c8f2e3_.log 2025-09-07T07:29:44.4885597Z Running 960 items in this shard: test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast1_broadcast1, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast1_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast1_broadcast3, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast1_dense, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast1_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast1_int, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast1_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast1_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_broadcast1, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_broadcast3, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_dense, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_int, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast3_broadcast1, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast3_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast3_broadcast3, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast3_dense, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast3_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast3_int, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast3_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast3_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_broadcast1, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_broadcast3, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_dense, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_int, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_broadcast1, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_broadcast3, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_dense, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_int, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_int_broadcast1, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_int_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_int_broadcast3, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_int_dense, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_int_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_int_int, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_int_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_int_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_broadcast1, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_broadcast3, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_dense, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_int, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_transposed_broadcast1, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_transposed_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_transposed_broadcast3, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_transposed_dense, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_transposed_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_transposed_int, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_transposed_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_transposed_transposed, test/inductor/test_torchinductor.py::GPUTests::test_AllenaiLongformerBase_repro_cuda, test/inductor/test_torchinductor.py::GPUTests::test__dyn_quant_matmul_4bit_cuda, test/inductor/test_torchinductor.py::GPUTests::test__dyn_quant_pack_4bit_weight_cuda, test/inductor/test_torchinductor.py::GPUTests::test__unsafe_masked_index_cuda, test/inductor/test_torchinductor.py::GPUTests::test__unsafe_masked_index_put_accumulate_cuda, test/inductor/test_torchinductor.py::GPUTests::test_abs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_avg_pool1d_argmax_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_avg_pool2d1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_avg_pool2d2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_avg_pool2d_low_prec_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_avg_pool_errors_with_long_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_avg_pool_with_output_size_0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_max_pool2d1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_max_pool2d2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_max_pool2d3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_pool_errors_with_long_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex10_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex6_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex7_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex9_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_const_float_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_const_int_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_inplace_permuted_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adding_tensor_offsets_cuda, test/inductor/test_torchinductor.py::GPUTests::test_addmm_cuda, test/inductor/test_torchinductor.py::GPUTests::test_addmv_cuda, test/inductor/test_torchinductor.py::GPUTests::test_alexnet_prefix_cuda, test/inductor/test_torchinductor.py::GPUTests::test_aliased_buffer_reuse_cuda, test/inductor/test_torchinductor.py::GPUTests::test_allow_reuse_active_if_under_peak_cuda, test/inductor/test_torchinductor.py::GPUTests::test_allow_reuse_disable_if_exceed_peak_cuda, test/inductor/test_torchinductor.py::GPUTests::test_angle_cuda, test/inductor/test_torchinductor.py::GPUTests::test_any_cuda, test/inductor/test_torchinductor.py::GPUTests::test_aoti_eager_cache_hit_cuda, test/inductor/test_torchinductor.py::GPUTests::test_aoti_eager_dtype_device_layout_cuda, test/inductor/test_torchinductor.py::GPUTests::test_aoti_eager_override_registration_cuda, test/inductor/test_torchinductor.py::GPUTests::test_aoti_eager_support_out_cuda, test/inductor/test_torchinductor.py::GPUTests::test_aoti_eager_support_str_cuda, test/inductor/test_torchinductor.py::GPUTests::test_aoti_eager_with_persistent_cache_cuda, test/inductor/test_torchinductor.py::GPUTests::test_aoti_eager_with_scalar_cuda, test/inductor/test_torchinductor.py::GPUTests::test_arange1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_arange2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_arange3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_arange4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_arange5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_arange6_cuda, test/inductor/test_torchinductor.py::GPUTests::test_argmax_argmin1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_argmax_argmin2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_argmax_argmin3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_argmax_argmin_with_duplicates_cuda, test/inductor/test_torchinductor.py::GPUTests::test_argmax_argmin_with_nan_cuda, test/inductor/test_torchinductor.py::GPUTests::test_argmax_min_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_argmax_to_float_cuda, test/inductor/test_torchinductor.py::GPUTests::test_as_strided_cuda, test/inductor/test_torchinductor.py::GPUTests::test_as_strided_scatter_cuda, test/inductor/test_torchinductor.py::GPUTests::test_assert_alignment_op_name_fail_cuda, test/inductor/test_torchinductor.py::GPUTests::test_assert_alignment_op_name_pass_cuda, test/inductor/test_torchinductor.py::GPUTests::test_assert_size_stride_op_name_fail_cuda, test/inductor/test_torchinductor.py::GPUTests::test_assert_size_stride_op_name_pass_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d6_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d7_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d_backward2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d_backward3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d_backward4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d_backward_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool3d_backward2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool3d_backward3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool3d_backward4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool3d_backward_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool_errors_with_uint_cuda, test/inductor/test_torchinductor.py::GPUTests::test_baddbmm_cuda, test/inductor/test_torchinductor.py::GPUTests::test_batch_norm_2d_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_batch_norm_2d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bernoulli1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bernoulli2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bfloat16_to_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bitwise2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bitwise3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bitwise_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bmm1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bmm2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bool_cuda, test/inductor/test_torchinductor.py::GPUTests::test_both_scalars_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_add_autotune_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_broadcast_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_computed_offsets_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_default_kwargs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int16_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int16_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int16_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int16_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int16_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int32_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int32_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int32_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int32_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int32_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int64_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int64_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int64_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int64_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int64_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int8_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int8_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int8_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int8_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int8_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_uint8_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_uint8_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_uint8_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_uint8_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_uint8_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_nd_tiling_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_nd_tiling_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_buffer_batch_norm_cuda, test/inductor/test_torchinductor.py::GPUTests::test_buffer_copied_in_graph_cuda, test/inductor/test_torchinductor.py::GPUTests::test_buffer_copied_in_graph_with_different_shapes_cuda, test/inductor/test_torchinductor.py::GPUTests::test_buffer_use_after_remove_cuda, test/inductor/test_torchinductor.py::GPUTests::test_builtins_round_cuda, test/inductor/test_torchinductor.py::GPUTests::test_builtins_round_float_ndigits_neg_cuda, test/inductor/test_torchinductor.py::GPUTests::test_builtins_round_float_ndigits_pos_cuda, test/inductor/test_torchinductor.py::GPUTests::test_builtins_round_float_ndigits_zero_cuda, test/inductor/test_torchinductor.py::GPUTests::test_builtins_round_int_ndigits_pos_cuda, test/inductor/test_torchinductor.py::GPUTests::test_builtins_round_int_ndigits_zero_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_empty_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_empty_index_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_extern_kernel_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_inplace_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_negative_dim_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_of_loops_and_extern_kernel_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_single_empty_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_unbacked_2d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_unbacked_empty_1d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_unbacked_legacy_empty_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_upcasting_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cauchy_cuda, test/inductor/test_torchinductor.py::GPUTests::test_check_stack_no_cycles_cuda, test/inductor/test_torchinductor.py::GPUTests::test_chunk_recompiles_cuda, test/inductor/test_torchinductor.py::GPUTests::test_clamp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_clamp_type_promotion_cuda, test/inductor/test_torchinductor.py::GPUTests::test_clamp_type_promotion_non_tensor_cuda, test/inductor/test_torchinductor.py::GPUTests::test_clone_cuda, test/inductor/test_torchinductor.py::GPUTests::test_compar_cuda, test/inductor/test_torchinductor.py::GPUTests::test_complex_fallback_cuda, test/inductor/test_torchinductor.py::GPUTests::test_complex_from_real_imag_cuda, test/inductor/test_torchinductor.py::GPUTests::test_complex_memory_overlap_cuda, test/inductor/test_torchinductor.py::GPUTests::test_computed_buffer_inlining_cuda, test/inductor/test_torchinductor.py::GPUTests::test_concat_add_inplace_cuda, test/inductor/test_torchinductor.py::GPUTests::test_config_option_dont_assume_alignment_cuda, test/inductor/test_torchinductor.py::GPUTests::test_config_option_dont_assume_alignment_cudagraphs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_config_option_dont_assume_alignment_recompiles_cuda, test/inductor/test_torchinductor.py::GPUTests::test_consecutive_split_cumprod_cuda, test/inductor/test_torchinductor.py::GPUTests::test_consecutive_split_cumsum_cuda, test/inductor/test_torchinductor.py::GPUTests::test_const_int32_to_float_cuda, test/inductor/test_torchinductor.py::GPUTests::test_constant_pad_1d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_constant_pad_2d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_constant_pad_2d_strides_nonpositive_cuda, test/inductor/test_torchinductor.py::GPUTests::test_constant_pad_3d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_constant_pad_fill_dtype_cuda, test/inductor/test_torchinductor.py::GPUTests::test_constant_pad_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_constant_pad_nd_inplace_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv2d_backward_channels_last_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv2d_channels_last_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv3d_channels_last_use_block_ptr_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv3d_channels_last_use_block_ptr_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv3d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv_backward_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv_bn_fuse_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv_functional_bn_fuse_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv_inference_heuristics_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv_shape_check_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv_with_as_strided_cuda, test/inductor/test_torchinductor.py::GPUTests::test_convolution1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_convolution2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_convolution3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_convolution4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_convolution5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cos_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cudnn_rnn_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cummin_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cumprod_zero_dim_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cumsum_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cumsum_inf_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cumsum_no_mask_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cumsum_pattern_matcher_issue_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cumsum_zero_dim_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_op_1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_op_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_op_3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_op_default_layout_constraint_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_op_fixed_layout_channels_last_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_op_fixed_layout_sequential_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_op_unbacked_symints_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_scan_op_compiled_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_scan_op_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_scan_op_multi_input_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_scan_would_split_cuda, test/inductor/test_torchinductor.py::GPUTests::test_data_type_propogation_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dense_mask_index_cuda, test/inductor/test_torchinductor.py::GPUTests::test_deterministic_codegen_cuda, test/inductor/test_torchinductor.py::GPUTests::test_deterministic_codegen_on_graph_break_cuda, test/inductor/test_torchinductor.py::GPUTests::test_deterministic_codegen_with_suffix_cuda, test/inductor/test_torchinductor.py::GPUTests::test_device_assert_cuda, test/inductor/test_torchinductor.py::GPUTests::test_diagonal_copy_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dist_bf16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dist_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div6_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div7_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div9_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div_by_zero_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div_precision_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div_presicion_accuracy_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div_prim_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div_softmax_symfloat_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div_zero_dim_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dont_constant_fold_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dropout2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dropout3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dropout_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dropout_deterministic_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dropout_trivial_0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dropout_trivial_1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtype_mismatch_issue_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtype_sympy_expr_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_fusion_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int32_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int32_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int32_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int32_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int32_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int32_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int32_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int32_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int32_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_elu_cuda, test/inductor/test_torchinductor.py::GPUTests::test_embedding_bag_byte_unpack_cuda, test/inductor/test_torchinductor.py::GPUTests::test_embedding_bag_cuda, test/inductor/test_torchinductor.py::GPUTests::test_embedding_cuda, test/inductor/test_torchinductor.py::GPUTests::test_embedding_sparse_cuda, test/inductor/test_torchinductor.py::GPUTests::test_empty1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_empty2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_empty_strided_cuda, test/inductor/test_torchinductor.py::GPUTests::test_erfc_cuda, test/inductor/test_torchinductor.py::GPUTests::test_erfinv_cuda, test/inductor/test_torchinductor.py::GPUTests::test_exact_stride_cuda, test/inductor/test_torchinductor.py::GPUTests::test_exp2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_exp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_expand_as_cuda, test/inductor/test_torchinductor.py::GPUTests::test_expand_cuda, test/inductor/test_torchinductor.py::GPUTests::test_expanded_reduction_cuda, test/inductor/test_torchinductor.py::GPUTests::test_expm1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fallback_mutable_op_basic_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fallback_mutable_op_list_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fallback_mutable_op_list_tensor_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fallback_mutable_op_no_mutated_tensors_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fallback_mutable_op_with_return_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fft_real_input_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fft_real_input_real_output_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fill1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fill2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_flip_cat_cuda, test/inductor/test_torchinductor.py::GPUTests::test_flip_cuda, test/inductor/test_torchinductor.py::GPUTests::test_float16_to_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_float32_to_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_float_index_expression_cuda, test/inductor/test_torchinductor.py::GPUTests::test_float_index_expression_type_promotion_cuda, test/inductor/test_torchinductor.py::GPUTests::test_float_repr_dynamic_shapes_cuda, test/inductor/test_torchinductor.py::GPUTests::test_floordiv_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fmin_fmax_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fmod_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fmod_zero_dim_cuda, test/inductor/test_torchinductor.py::GPUTests::test_forced_buffer_realize_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fractional_max_pool2d1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fractional_max_pool2d2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fractional_max_pool2d3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fractional_max_pool2d4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fractional_max_pool2d5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_full_boolean_cuda, test/inductor/test_torchinductor.py::GPUTests::test_full_like_cuda, test/inductor/test_torchinductor.py::GPUTests::test_full_like_sliced_cuda, test/inductor/test_torchinductor.py::GPUTests::test_full_like_transposed_cuda, test/inductor/test_torchinductor.py::GPUTests::test_full_truncation_cuda, test/inductor/test_torchinductor.py::GPUTests::test_functionalize_rng_wrappers_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fuse_large_params_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fuse_tiled_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fusing_write_into_disjoint_read_cuda, test/inductor/test_torchinductor.py::GPUTests::test_gather1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_gather2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_gather3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_gather_scatter_cuda, test/inductor/test_torchinductor.py::GPUTests::test_gelu_cuda, test/inductor/test_torchinductor.py::GPUTests::test_generate_rand_fp8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_generated_code_has_alignment_assert_cuda, test/inductor/test_torchinductor.py::GPUTests::test_generated_code_has_size_stride_assert_cuda, test/inductor/test_torchinductor.py::GPUTests::test_getitem_cuda, test/inductor/test_torchinductor.py::GPUTests::test_glu_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_arange1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_arange2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_argmax_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_both_scalars_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_constant_tensor1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_constant_tensor2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_misaligned_input_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_mutation_real_name_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_no_inputs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_pad_dynamic_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_refcount_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_scalar_inputs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_unbacked_symint_as_output_cuda, test/inductor/test_torchinductor.py::GPUTests::test_grid_sampler_2d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_hardsigmoid_cuda, test/inductor/test_torchinductor.py::GPUTests::test_hardswish_cuda, test/inductor/test_torchinductor.py::GPUTests::test_hardtanh_cuda, test/inductor/test_torchinductor.py::GPUTests::test_horizonal_fusion1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_horizonal_fusion2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_dynamic_shapes_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_propagation_abs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_propagation_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_propagation_device_assert_masked_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_propagation_flip_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_propagation_floordiv_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_propagation_nested_indirect_indexing_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_propagation_remainder_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put_as_masked_fill_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put_deterministic_fallback_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put_failed_reinplace_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put_fallback1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put_fallback2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put_index_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put_reinplace_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_remainder_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_select_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_tensor_cuda, test/inductor/test_torchinductor.py::GPUTests::test_indirect_load_broadcast_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inductor_assert_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inductor_layout_optimization_input_mutations_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inductor_multiple_specializations_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inductor_triton_bucketize_respects_masking_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inf_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inner_fn_str_and_stride_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inplace_activations_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inplace_add_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inplace_mixed_dtype_ops_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inplace_resize_as_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inplace_where_pointwise_cuda, test/inductor/test_torchinductor.py::GPUTests::test_input_mutation1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_input_mutation2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_input_mutation3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_input_mutation4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_input_mutation5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_insignificant_strides_cuda, test/inductor/test_torchinductor.py::GPUTests::test_int8_weight_only_quant_cuda, test/inductor/test_torchinductor.py::GPUTests::test_int_input_dynamic_shapes_cuda, test/inductor/test_torchinductor.py::GPUTests::test_invalid_operand_issue1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_isin_tensor_scalar_cuda, test/inductor/test_torchinductor.py::GPUTests::test_isinf2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_isinf_cuda, test/inductor/test_torchinductor.py::GPUTests::test_issue102546_cuda, test/inductor/test_torchinductor.py::GPUTests::test_kernel_names_cuda, test/inductor/test_torchinductor.py::GPUTests::test_kwargs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_l1_loss_cuda, test/inductor/test_torchinductor.py::GPUTests::test_large_broadcast_reduction_cuda, test/inductor/test_torchinductor.py::GPUTests::test_large_grid_use_block_ptr_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_large_grid_use_block_ptr_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_large_offset_pointwise_cuda, test/inductor/test_torchinductor.py::GPUTests::test_large_pointwise_cuda, test/inductor/test_torchinductor.py::GPUTests::test_large_strided_reduction_cuda, test/inductor/test_torchinductor.py::GPUTests::test_large_tensor_reduction_cuda, test/inductor/test_torchinductor.py::GPUTests::test_layer_norm_cuda, test/inductor/test_torchinductor.py::GPUTests::test_leaky_relu_cuda, test/inductor/test_torchinductor.py::GPUTests::test_lerp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_lgamma_cuda, test/inductor/test_torchinductor.py::GPUTests::test_like_channels_last_cuda, test/inductor/test_torchinductor.py::GPUTests::test_like_rands2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_like_rands3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_like_rands_cuda, test/inductor/test_torchinductor.py::GPUTests::test_like_rands_sliced_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linear1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linear2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linear_dynamic_maxautotune_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linear_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linear_mixed_dtype_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linspace1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linspace2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linspace3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linspace4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_list_clearing_cuda, test/inductor/test_torchinductor.py::GPUTests::test_log1p_cuda, test/inductor/test_torchinductor.py::GPUTests::test_log2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_log_fp64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_log_softmax_cuda, test/inductor/test_torchinductor.py::GPUTests::test_logaddexp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_logcumsumexp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_logcumsumexp_zero_dim_cuda, test/inductor/test_torchinductor.py::GPUTests::test_logsumexp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_long_tensor_cuda, test/inductor/test_torchinductor.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_low_memory_max_pool_dilation_2_dim_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_low_memory_max_pool_dilation_2_dim_3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mark_dynamic_with_hint_override_cuda, test/inductor/test_torchinductor.py::GPUTests::test_masked_fill_cuda, test/inductor/test_torchinductor.py::GPUTests::test_masked_fill_promotion_cuda, test/inductor/test_torchinductor.py::GPUTests::test_masked_scatter_cuda, test/inductor/test_torchinductor.py::GPUTests::test_matmul_layer_norm_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_min_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d6_dilation_1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d6_dilation_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d7_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d_with_indices_backward2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d_with_indices_backward3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d_with_indices_backward4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d_with_indices_backward5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d_with_indices_backward6_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d_with_indices_backward_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mean_cuda, test/inductor/test_torchinductor.py::GPUTests::test_min_max_reduction_cuda, test/inductor/test_torchinductor.py::GPUTests::test_min_max_reduction_nan_cuda, test/inductor/test_torchinductor.py::GPUTests::test_misaligned_address_issue1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mix_device_index_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mixed_mm2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mixed_mm3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mixed_mm_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mm_mixed_dtype_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mm_views_cuda, test/inductor/test_torchinductor.py::GPUTests::test_move_arange_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mul_index_expr_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mul_softmax_symfloat_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multi_device_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multi_gpu_device_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multi_gpu_recompile_on_index_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multi_threading_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multilayer_any_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multilayer_prime_size_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multilayer_sum_low_prec_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multilayer_var_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multilayer_var_lowp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mutable_custom_op_fixed_layout2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mutable_custom_op_fixed_layout_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mutations_loop_fusion_cuda, test/inductor/test_torchinductor.py::GPUTests::test_nan_sort_stable_False_descending_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_nan_sort_stable_False_descending_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_nan_sort_stable_True_descending_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_nan_sort_stable_True_descending_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_nan_to_num_cuda, test/inductor/test_torchinductor.py::GPUTests::test_narrow_cuda, test/inductor/test_torchinductor.py::GPUTests::test_needs_contiguous_strides_cuda, test/inductor/test_torchinductor.py::GPUTests::test_neg_index_cuda, test/inductor/test_torchinductor.py::GPUTests::test_neg_max_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_new_empty_cuda, test/inductor/test_torchinductor.py::GPUTests::test_new_empty_strided_cuda, test/inductor/test_torchinductor.py::GPUTests::test_new_ones_cuda, test/inductor/test_torchinductor.py::GPUTests::test_nll_loss_backward_cuda, test/inductor/test_torchinductor.py::GPUTests::test_nll_loss_forward_cuda, test/inductor/test_torchinductor.py::GPUTests::test_no_mega_fusion_during_lowering_cuda, test/inductor/test_torchinductor.py::GPUTests::test_no_op_reduction_cuda, test/inductor/test_torchinductor.py::GPUTests::test_no_specization_over_symbolic_value_cuda, test/inductor/test_torchinductor.py::GPUTests::test_nonzero_unbacked_refinement_cuda, test/inductor/test_torchinductor.py::GPUTests::test_norm_constant_overflow_cuda, test/inductor/test_torchinductor.py::GPUTests::test_one_hot_cuda, test/inductor/test_torchinductor.py::GPUTests::test_output_strides_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pad_cast_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pad_single_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pad_view_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pattern_matcher_multi_user_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pattern_matcher_unbacked_cuda, test/inductor/test_torchinductor.py::GPUTests::test_permute1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_permute2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_philox_rand_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pixel_shuffle_channels_last_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_airy_ai_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_bessel_j0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_bessel_j1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_bessel_y0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_bessel_y1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_chebyshev_polynomial_t_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_chebyshev_polynomial_u_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_chebyshev_polynomial_v_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_chebyshev_polynomial_w_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_digamma_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_entr_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_erf_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_erfc_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_erfcx_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_erfinv_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_exp2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_expit_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_expm1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_gammainc_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_gammaincc_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_gammaln_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_hermite_polynomial_h_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_hermite_polynomial_he_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_i0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_i0e_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_i1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_i1e_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_laguerre_polynomial_l_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_legendre_polynomial_p_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_log1p_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_log_ndtr_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_logit_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_modified_bessel_i0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_modified_bessel_i1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_modified_bessel_k0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_modified_bessel_k1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_multigammaln_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_ndtr_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_ndtri_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_polygamma_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_psi_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_round_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_scaled_modified_bessel_k0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_scaled_modified_bessel_k1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_t_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_u_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_v_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_w_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_sinc_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_spherical_bessel_j0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_xlog1py_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_xlogy_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_zeta_cuda, test/inductor/test_torchinductor.py::GPUTests::test_polar_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pow1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pow2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pow3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pow_by_natural_log2_dynamic_shapes_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pow_int_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pow_symfloat_cuda, test/inductor/test_torchinductor.py::GPUTests::test_prepare_softmax_with_fast_math_cuda, test/inductor/test_torchinductor.py::GPUTests::test_prod_cuda, test/inductor/test_torchinductor.py::GPUTests::test_profiler_mark_wrapper_call_cuda, test/inductor/test_torchinductor.py::GPUTests::test_rand_like_deterministic_cuda, test/inductor/test_torchinductor.py::GPUTests::test_randint_cuda, test/inductor/test_torchinductor.py::GPUTests::test_randint_distribution_cuda, test/inductor/test_torchinductor.py::GPUTests::test_randint_int64_mod_cuda, test/inductor/test_torchinductor.py::GPUTests::test_randint_kernel_count_cuda, test/inductor/test_torchinductor.py::GPUTests::test_randn_generator_cuda, test/inductor/test_torchinductor.py::GPUTests::test_randn_like_empty_cuda, test/inductor/test_torchinductor.py::GPUTests::test_randn_with_dtype_and_device_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reduction1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reduction2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reduction3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reduction4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reduction5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reduction_config_limit_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reflection_pad2d_backward_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reflection_pad2d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reinterpret_dtypeview_cuda, test/inductor/test_torchinductor.py::GPUTests::test_relu_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remainder_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remove_no_ops_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remove_noop_clone_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remove_noop_copy_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remove_noop_slice1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remove_noop_slice_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remove_noop_slice_scatter_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remove_noop_view_default_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remove_noop_view_dtype_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_as_strided_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_interleave_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_interleave_cuda, test/inductor/test_torchinductor.py::GPUTests::test_replication_pad_errors_with_bool_cuda, test/inductor/test_torchinductor.py::GPUTests::test_require_stride_expanded_cuda, test/inductor/test_torchinductor.py::GPUTests::test_resize_as_cuda, test/inductor/test_torchinductor.py::GPUTests::test_resize_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reuse_buffers_with_aliasing_cuda, test/inductor/test_torchinductor.py::GPUTests::test_roi_align_cuda, test/inductor/test_torchinductor.py::GPUTests::test_roll_cuda, test/inductor/test_torchinductor.py::GPUTests::test_round_correctness_cuda, test/inductor/test_torchinductor.py::GPUTests::test_round_cuda, test/inductor/test_torchinductor.py::GPUTests::test_rsqrt_cuda, test/inductor/test_torchinductor.py::GPUTests::test_rsqrt_dynamic_shapes_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scalar_cpu_tensor_arg_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scalar_input_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scalar_output_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scaled_dot_product_attention_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scaled_dot_product_efficient_attention_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter6_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter_add1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter_add2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter_add3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter_bf16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter_reduce1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter_reduce2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter_reduce3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scheduler_vertical_fusion1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sdpa_unaligned_mask_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sdpa_unaligned_mask_freezing_cuda, test/inductor/test_torchinductor.py::GPUTests::test_searchsorted_broadcast_cuda, test/inductor/test_torchinductor.py::GPUTests::test_searchsorted_cuda, test/inductor/test_torchinductor.py::GPUTests::test_select_scatter_cuda, test/inductor/test_torchinductor.py::GPUTests::test_setitem_with_int_parameter_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sgn_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sgn_extremal_cuda, test/inductor/test_torchinductor.py::GPUTests::test_shape_padding_cuda, test/inductor/test_torchinductor.py::GPUTests::test_shape_prop_torch_ones_cuda, test/inductor/test_torchinductor.py::GPUTests::test_should_pad_bench_for_bmm_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sigmoid_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sign_dtype_cuda, test/inductor/test_torchinductor.py::GPUTests::test_signbit_cuda, test/inductor/test_torchinductor.py::GPUTests::test_silu_cuda, test/inductor/test_torchinductor.py::GPUTests::test_simplify_loops_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sin_cuda, test/inductor/test_torchinductor.py::GPUTests::test_single_elem_cuda, test/inductor/test_torchinductor.py::GPUTests::test_single_elem_indirect_cuda, test/inductor/test_torchinductor.py::GPUTests::test_size_asserts_for_multi_output_fallback_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sizehint_issue1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_mutation1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_mutation2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_mutation3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_scatter2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_scatter3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_scatter4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_scatter5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_scatter_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_scatter_dtype_consistency_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_scatter_reinplace_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_view_with_graph_break_cuda, test/inductor/test_torchinductor.py::GPUTests::test_softmax_backward_data_cuda, test/inductor/test_torchinductor.py::GPUTests::test_softmax_cuda, test/inductor/test_torchinductor.py::GPUTests::test_softmax_one_kernel_loop_cuda, test/inductor/test_torchinductor.py::GPUTests::test_softmax_one_kernel_persist_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sort_bool_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sort_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sort_stable_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sort_transpose_cuda, test/inductor/test_torchinductor.py::GPUTests::test_special_polygamma_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_cumprod_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_cumprod_low_prec_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_cumsum_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_cumsum_index_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_cumsum_low_prec_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_failed_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_reduction_dynamic_shape_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_reduction_with_int64_size_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_with_integer_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_with_list_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_with_sizes_with_unbacked_symints_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_with_unbacked_symints_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sqrt_dynamic_shapes_cuda, test/inductor/test_torchinductor.py::GPUTests::test_squeeze1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_squeeze2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_squeeze_varargs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_stack_cuda, test/inductor/test_torchinductor.py::GPUTests::test_std_cuda, test/inductor/test_torchinductor.py::GPUTests::test_stride_preservation_with_stride_modifying_fx_pass_cuda, test/inductor/test_torchinductor.py::GPUTests::test_strided_inputs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum_dtype_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum_int_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum_keepdims_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tan_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tanh_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tensor1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tensor2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tensor3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tensor_index_put_slice_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tensor_index_slice_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tmp_not_defined_issue2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tmp_not_defined_issue3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_to_device_constant_cuda, test/inductor/test_torchinductor.py::GPUTests::test_to_device_cuda, test/inductor/test_torchinductor.py::GPUTests::test_to_dtype_cuda, test/inductor/test_torchinductor.py::GPUTests::test_to_memory_format_cuda, test/inductor/test_torchinductor.py::GPUTests::test_topk_cuda, test/inductor/test_torchinductor.py::GPUTests::test_torch_device_split_cuda, test/inductor/test_torchinductor.py::GPUTests::test_transpose_add_cuda, test/inductor/test_torchinductor.py::GPUTests::test_transpose_cuda, test/inductor/test_torchinductor.py::GPUTests::test_transposed_propagates_cuda, test/inductor/test_torchinductor.py::GPUTests::test_triton_kernel_bool_param_cuda, test/inductor/test_torchinductor.py::GPUTests::test_triu_cuda, test/inductor/test_torchinductor.py::GPUTests::test_uint4x2_mixed_mm_cuda, test/inductor/test_torchinductor.py::GPUTests::test_uint_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unbacked_floordiv_simplify_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unbacked_floordiv_simplify_errors_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unbind_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unfold_zero_dimension_tensor_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unroll_small_reduction_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unsigned_constant_tensors_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unsqueeze_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unsqueeze_inplace_cuda, test/inductor/test_torchinductor.py::GPUTests::test_upsample_bicubic2d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_upsample_bilinear2d_a_cuda, test/inductor/test_torchinductor.py::GPUTests::test_upsample_bilinear2d_b_cuda, test/inductor/test_torchinductor.py::GPUTests::test_upsample_cat_conv_cuda, test/inductor/test_torchinductor.py::GPUTests::test_upsample_nearest1d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_upsample_nearest2d_backward_cuda, test/inductor/test_torchinductor.py::GPUTests::test_upsample_nearest2d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_upsample_nearest3d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_var_correction_cuda, test/inductor/test_torchinductor.py::GPUTests::test_var_mean_div_by_cuda, test/inductor/test_torchinductor.py::GPUTests::test_var_mean_tile_reduction_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_var_mean_tile_reduction_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_vdd_clamp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_vectorized_ops_masked_cuda, test/inductor/test_torchinductor.py::GPUTests::test_vectorized_ops_masked_var_novec_cuda, test/inductor/test_torchinductor.py::GPUTests::test_vertical_fusion1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_view_as_complex_cuda, test/inductor/test_torchinductor.py::GPUTests::test_view_as_real_cuda, test/inductor/test_torchinductor.py::GPUTests::test_view_detach_cuda, test/inductor/test_torchinductor.py::GPUTests::test_view_on_aliased_cuda, test/inductor/test_torchinductor.py::GPUTests::test_view_uint8_through_differing_bitwidths_cuda, test/inductor/test_torchinductor.py::GPUTests::test_views1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_views2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_views3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_views4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_views5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_views6_cuda, test/inductor/test_torchinductor.py::GPUTests::test_views7_cuda, test/inductor/test_torchinductor.py::GPUTests::test_weight_norm_bwd_cuda, test/inductor/test_torchinductor.py::GPUTests::test_where_broadcast_cuda, test/inductor/test_torchinductor.py::GPUTests::test_where_with_logical_op_cuda, test/inductor/test_torchinductor.py::GPUTests::test_xblock_divides_xnumel_cuda, test/inductor/test_torchinductor.py::GPUTests::test_zero_dim_reductions_cuda, test/inductor/test_torchinductor.py::GPUTests::test_zero_element_mutation_cuda, test/inductor/test_torchinductor.py::GPUTests::test_zeros_cuda, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_bandwidth_profiler, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_cant_optimize_compute, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_codegen_config_option_dont_assume_alignment, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_comment_graph_fragment, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_computed_indirect_mask, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_constant_folding_deallocation, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_ctr_not_moved_to_cuda_when_used_in_index_put, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_divisible_by_16_covers_numel_args, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_donated_buffer_inplace, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_donated_buffer_inplace_gpt, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_evict_last_non_coalesced_loads, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_evict_last_non_coalesced_loads_block_ptr, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_grouped_mm, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_has_constant_mask_block_multiple_False_ynumel_exceed_ygrid_size_False, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_has_constant_mask_block_multiple_True_ynumel_exceed_ygrid_size_False, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_has_constant_mask_block_multiple_True_ynumel_exceed_ygrid_size_True, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_indirect_device_assert, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_inductor_detach_view_backend_aot_eager, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_inductor_detach_view_backend_inductor, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_inductor_sequence_nr, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_kernel_names_descriptive, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_layer_norm_inplaces_after_matmul, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_non_blocking_copy_codegen, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_not_materialize_pointwise_reduction, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_numpy_autograd, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_numpy_on_gpu, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_optimize_compute, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_optimize_indexing_assert, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_optimize_indexing_dtype, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_optimize_indexing_dtype_with_constraint, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_red_followed_by_transposed_pointwise, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_respect_scaled_grouped_mm_layout_tag, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_rope_fusion, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_sdpa_inference_mode_aot_compile, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_skip_l1_cache, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_split_op_with_sym, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_triton_attrs_dict_constexpr_signature, test/inductor/test_torchinductor.py::RNNTest::test_rnn_compile_safe, test/inductor/test_torchinductor.py::NanCheckerTest::test_nan_checker_fail, test/inductor/test_torchinductor.py::NanCheckerTest::test_nan_checker_pass 2025-09-07T07:29:44.5124393Z 2025-09-07T07:29:44.5124577Z Running dynamo/test_dicts 1/1 ... [2025-09-07 07:29:44.463692] 2025-09-07T07:29:44.5125095Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:29:44.5126012Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_dicts.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:29:44.464026] 2025-09-07T07:29:48.5343128Z 2025-09-07T07:29:48.5344227Z dynamo/test_dicts 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_dicts_1.1_8a2519a4df72fa58_.log 2025-09-07T07:29:48.5374981Z Running 126 items in this shard: test/dynamo/test_dicts.py::DictTests::test_builtin_ior_, test/dynamo/test_dicts.py::DictTests::test_builtin_or_with_diff_keys, test/dynamo/test_dicts.py::DictTests::test_builtin_or_with_invalid_types, test/dynamo/test_dicts.py::DictTests::test_builtin_or_with_same_keys, test/dynamo/test_dicts.py::DictTests::test_construct_user_dict_and_return, test/dynamo/test_dicts.py::DictTests::test_contains_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_contains_module_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_custom_iter_dict, test/dynamo/test_dicts.py::DictTests::test_custom_keys_iter_dict, test/dynamo/test_dicts.py::DictTests::test_dict_construction_from_mapping_proxy, test/dynamo/test_dicts.py::DictTests::test_dict_contains, test/dynamo/test_dicts.py::DictTests::test_dict_copy_alias, test/dynamo/test_dicts.py::DictTests::test_dict_guard_on_keys_order, test/dynamo/test_dicts.py::DictTests::test_dict_guard_on_keys_order2, test/dynamo/test_dicts.py::DictTests::test_dict_iter, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_and_, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_or_, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_sub, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_xor, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_iand, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_ior, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_isub, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_ixor, test/dynamo/test_dicts.py::DictTests::test_dict_list_values, test/dynamo/test_dicts.py::DictTests::test_dict_mutation_side_effect, test/dynamo/test_dicts.py::DictTests::test_dict_namedtuple, test/dynamo/test_dicts.py::DictTests::test_dict_order_keys, test/dynamo/test_dicts.py::DictTests::test_dict_order_keys_modules, test/dynamo/test_dicts.py::DictTests::test_dict_order_keys_tensors, test/dynamo/test_dicts.py::DictTests::test_dict_reconstruct_keeps_original_order, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_contains, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_get_method, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_initialization_in_graph, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_instantiation, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_instantiation_return, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_local_mutation, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_local_with_non_dict_method, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_methods_fallback_mutation, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_methods_fallback_readonly, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_setitem, test/dynamo/test_dicts.py::DictTests::test_dict_tag_guard, test/dynamo/test_dicts.py::DictTests::test_empty_dict_recompilation, test/dynamo/test_dicts.py::DictTests::test_fn_id, test/dynamo/test_dicts.py::DictTests::test_items_type, test/dynamo/test_dicts.py::DictTests::test_lazy_key_guarding, test/dynamo/test_dicts.py::DictTests::test_lazy_key_non_const_guarding, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_ban_muation_on_dict_realization, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_existing, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_existing_local_mutation, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_existing_mutation, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_for_local, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_for_nonlocal, test/dynamo/test_dicts.py::DictTests::test_move_to_end, test/dynamo/test_dicts.py::DictTests::test_newly_constructed_default_dict, test/dynamo/test_dicts.py::DictTests::test_ordered_dict_reordered_keys, test/dynamo/test_dicts.py::DictTests::test_ordered_dict_subclass_reordered_keys, test/dynamo/test_dicts.py::DictTests::test_overridden_get_item, test/dynamo/test_dicts.py::DictTests::test_udf_dict_reconstruction, test/dynamo/test_dicts.py::DictTests::test_update_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_update_module_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_weakref_dict, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_eq, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_ior, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_ne, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_or, test/dynamo/test_dicts.py::DictGuardTests::test_popitem, test/dynamo/test_dicts.py::DictMethodsTests::test_binop_ior, test/dynamo/test_dicts.py::DictMethodsTests::test_binop_ior_iterable, test/dynamo/test_dicts.py::DictMethodsTests::test_binop_or, test/dynamo/test_dicts.py::DictMethodsTests::test_clear, test/dynamo/test_dicts.py::DictMethodsTests::test_cmp_eq, test/dynamo/test_dicts.py::DictMethodsTests::test_cmp_ne, test/dynamo/test_dicts.py::DictMethodsTests::test_copy, test/dynamo/test_dicts.py::DictMethodsTests::test_dict_type_comparison, test/dynamo/test_dicts.py::DictMethodsTests::test_fromkeys, test/dynamo/test_dicts.py::DictMethodsTests::test_get, test/dynamo/test_dicts.py::DictMethodsTests::test_items, test/dynamo/test_dicts.py::DictMethodsTests::test_keys, test/dynamo/test_dicts.py::DictMethodsTests::test_pop, test/dynamo/test_dicts.py::DictMethodsTests::test_popitem, test/dynamo/test_dicts.py::DictMethodsTests::test_setdefault, test/dynamo/test_dicts.py::DictMethodsTests::test_type, test/dynamo/test_dicts.py::DictMethodsTests::test_update, test/dynamo/test_dicts.py::DictMethodsTests::test_values, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_binop_ior, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_binop_ior_iterable, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_binop_or, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_clear, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_cmp_eq, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_cmp_ne, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_copy, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_dict_type_comparison, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_fromkeys, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_get, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_items, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_keys, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_pop, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_popitem, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_setdefault, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_type, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_update, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_values, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_ior, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_ior_iterable, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_ior_return_type, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_or, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_or_return_type, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_clear, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_cmp_eq, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_cmp_eq_order, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_cmp_ne, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_copy, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_dict_type_comparison, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_fromkeys, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_get, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_items, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_keys, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_move_to_end, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_pop, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_popitem, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_popitem_kwarg, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_setdefault, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_type, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_update, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_values, test/dynamo/test_dicts.py::OrderedDictSubclassOverload::test_move_to_end 2025-09-07T07:29:48.5402043Z 2025-09-07T07:29:48.5402297Z Running dynamo/test_sdpa 1/1 ... [2025-09-07 07:29:48.534395] 2025-09-07T07:29:48.5402660Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:29:48.5403567Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_sdpa.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:29:48.534767] 2025-09-07T07:29:52.4045841Z 2025-09-07T07:29:52.4046894Z dynamo/test_sdpa 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_sdpa_1.1_aa032352c5b62038_.log 2025-09-07T07:29:52.4049298Z Running 4 items in this shard: test/dynamo/test_sdpa.py::TestSDPA::test_graph_break_SDPAParams, test/dynamo/test_sdpa.py::TestSDPA::test_input_SDPAParams, test/dynamo/test_sdpa.py::TestSDPA::test_intermediate_attr_access_SDPAParams, test/dynamo/test_sdpa.py::TestSDPA::test_returns_SDPAParams 2025-09-07T07:29:52.4050664Z 2025-09-07T07:29:52.4050877Z Running dynamo/test_list 1/1 ... [2025-09-07 07:29:52.404620] 2025-09-07T07:29:52.4051280Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:29:52.4052367Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_list.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:29:52.405006] 2025-09-07T07:29:56.4252703Z 2025-09-07T07:29:56.4253774Z dynamo/test_list 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_list_1.1_714d81c1a16e1829_.log 2025-09-07T07:29:56.4263433Z Running 39 items in this shard: test/dynamo/test_list.py::TupleTests::test___contains__, test/dynamo/test_list.py::TupleTests::test___getitem__, test/dynamo/test_list.py::TupleTests::test_binop_add, test/dynamo/test_list.py::TupleTests::test_binop_imul, test/dynamo/test_list.py::TupleTests::test_cmp_eq, test/dynamo/test_list.py::TupleTests::test_cmp_greater_than, test/dynamo/test_list.py::TupleTests::test_cmp_greater_than_or_equal, test/dynamo/test_list.py::TupleTests::test_cmp_less_than, test/dynamo/test_list.py::TupleTests::test_cmp_less_than_or_equal, test/dynamo/test_list.py::TupleTests::test_cmp_ne, test/dynamo/test_list.py::TupleTests::test_count, test/dynamo/test_list.py::TupleTests::test_index, test/dynamo/test_list.py::ListTests::test___contains__, test/dynamo/test_list.py::ListTests::test___delitem__, test/dynamo/test_list.py::ListTests::test___getitem__, test/dynamo/test_list.py::ListTests::test___setitem__, test/dynamo/test_list.py::ListTests::test_append, test/dynamo/test_list.py::ListTests::test_binop_add, test/dynamo/test_list.py::ListTests::test_binop_delitem_global_list, test/dynamo/test_list.py::ListTests::test_binop_iadd, test/dynamo/test_list.py::ListTests::test_binop_iadd_global_list, test/dynamo/test_list.py::ListTests::test_binop_imul, test/dynamo/test_list.py::ListTests::test_binop_imul_global_list, test/dynamo/test_list.py::ListTests::test_clear, test/dynamo/test_list.py::ListTests::test_cmp_eq, test/dynamo/test_list.py::ListTests::test_cmp_greater_than, test/dynamo/test_list.py::ListTests::test_cmp_greater_than_or_equal, test/dynamo/test_list.py::ListTests::test_cmp_less_than, test/dynamo/test_list.py::ListTests::test_cmp_less_than_or_equal, test/dynamo/test_list.py::ListTests::test_cmp_ne, test/dynamo/test_list.py::ListTests::test_copy, test/dynamo/test_list.py::ListTests::test_count, test/dynamo/test_list.py::ListTests::test_extend, test/dynamo/test_list.py::ListTests::test_index, test/dynamo/test_list.py::ListTests::test_insert, test/dynamo/test_list.py::ListTests::test_pop, test/dynamo/test_list.py::ListTests::test_remove, test/dynamo/test_list.py::ListTests::test_reverse, test/dynamo/test_list.py::ListTests::test_sort 2025-09-07T07:29:56.4271135Z 2025-09-07T07:29:56.4271354Z Running inductor/test_autoheuristic 1/1 ... [2025-09-07 07:29:56.425308] 2025-09-07T07:29:56.4271909Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:29:56.4272983Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_autoheuristic.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:29:56.425733] 2025-09-07T07:30:03.2998174Z 2025-09-07T07:30:03.2999084Z inductor/test_autoheuristic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_autoheuristic_1.1_2d8df18b97ba5bb4_.log 2025-09-07T07:30:03.2999968Z Running 0 items in this shard: 2025-09-07T07:30:03.3000200Z 2025-09-07T07:30:03.3000775Z Running test_flop_counter 1/1 ... [2025-09-07 07:30:03.299750] 2025-09-07T07:30:03.3001209Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:03.3002757Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_flop_counter.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:03.300073] 2025-09-07T07:30:07.2200343Z 2025-09-07T07:30:07.2201446Z test_flop_counter 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_flop_counter_1.1_3f8dcf741b4fc7db_.log 2025-09-07T07:30:07.2208710Z Running 22 items in this shard: test/test_flop_counter.py::TestFlopCounter::test_addmm_out, test/test_flop_counter.py::TestFlopCounter::test_autograd_op, test/test_flop_counter.py::TestFlopCounter::test_backward, test/test_flop_counter.py::TestFlopCounter::test_backward_reset, test/test_flop_counter.py::TestFlopCounter::test_conv_backwards_as_decomposition, test/test_flop_counter.py::TestFlopCounter::test_conv_transpose_loop, test/test_flop_counter.py::TestFlopCounter::test_convs, test/test_flop_counter.py::TestFlopCounter::test_custom, test/test_flop_counter.py::TestFlopCounter::test_custom_op, test/test_flop_counter.py::TestFlopCounter::test_flop_counter_variety, test/test_flop_counter.py::TestFlopCounter::test_hook_registration, test/test_flop_counter.py::TestFlopCounter::test_inference_mode, test/test_flop_counter.py::TestFlopCounter::test_module, test/test_flop_counter.py::TestFlopCounter::test_nested_attention_fake_tensors, test/test_flop_counter.py::TestFlopCounter::test_noop, test/test_flop_counter.py::TestFlopCounter::test_op, test/test_flop_counter.py::TestFlopCounter::test_pytrees, test/test_flop_counter.py::TestFlopCounter::test_scaled_mm, test/test_flop_counter.py::TestFlopCounter::test_sdpa, test/test_flop_counter.py::TestFlopCounter::test_sdpa_nested_tensor, test/test_flop_counter.py::TestFlopCounter::test_torchscript, test/test_flop_counter.py::TestFlopCounter::test_warning 2025-09-07T07:30:07.2215079Z 2025-09-07T07:30:07.2215316Z Running dynamo/test_fx_graph_runnable 1/1 ... [2025-09-07 07:30:07.220046] 2025-09-07T07:30:07.2215777Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:07.2217240Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fx_graph_runnable.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:07.220370] 2025-09-07T07:30:07.8309391Z 2025-09-07T07:30:07.8310571Z inductor/test_torchinductor_opinfo 9/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_9.12_1abe44c6db2dbb94_.log 2025-09-07T07:30:07.8418869Z Running 275 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___ror___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcdiv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bool_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_max_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_max_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exponential_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fliplr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_half_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_half_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igamma_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_det_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigvals_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_multi_dot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_svdvals_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vector_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logaddexp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_normalize_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_multinomial_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_elu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_glu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_static_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_fro_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_inf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_quantile_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_renorm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsub_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsub_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_cosine_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hann_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_kaiser_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_with_dtype_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_mm_reduce_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triu_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triu_indices_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unravel_index_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vdot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_float64 2025-09-07T07:30:07.8522649Z 2025-09-07T07:30:07.8522841Z Running inductor/test_ordered_set 1/1 ... [2025-09-07 07:30:07.831401] 2025-09-07T07:30:07.8523223Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:07.8524134Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_ordered_set.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:07.831705] 2025-09-07T07:30:12.6532177Z 2025-09-07T07:30:12.6533748Z inductor/test_ordered_set 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_ordered_set_1.1_0436eff5a163da2e_.log 2025-09-07T07:30:12.6636867Z Running 401 items in this shard: test/inductor/test_ordered_set.py::TestJointOps::test_and, test/inductor/test_ordered_set.py::TestJointOps::test_badcmp, test/inductor/test_ordered_set.py::TestJointOps::test_container_iterator, test/inductor/test_ordered_set.py::TestJointOps::test_contains, test/inductor/test_ordered_set.py::TestJointOps::test_cyclical_repr, test/inductor/test_ordered_set.py::TestJointOps::test_deepcopy, test/inductor/test_ordered_set.py::TestJointOps::test_difference, test/inductor/test_ordered_set.py::TestJointOps::test_do_not_rehash_dict_keys, test/inductor/test_ordered_set.py::TestJointOps::test_equality, test/inductor/test_ordered_set.py::TestJointOps::test_free_after_iterating, test/inductor/test_ordered_set.py::TestJointOps::test_gc, test/inductor/test_ordered_set.py::TestJointOps::test_intersection, test/inductor/test_ordered_set.py::TestJointOps::test_isdisjoint, test/inductor/test_ordered_set.py::TestJointOps::test_iterator_pickling, test/inductor/test_ordered_set.py::TestJointOps::test_len, test/inductor/test_ordered_set.py::TestJointOps::test_new_or_init, test/inductor/test_ordered_set.py::TestJointOps::test_or, test/inductor/test_ordered_set.py::TestJointOps::test_pickling, test/inductor/test_ordered_set.py::TestJointOps::test_setOfFrozensets, test/inductor/test_ordered_set.py::TestJointOps::test_sub, test/inductor/test_ordered_set.py::TestJointOps::test_sub_and_super, test/inductor/test_ordered_set.py::TestJointOps::test_subclass_with_custom_hash, test/inductor/test_ordered_set.py::TestJointOps::test_symmetric_difference, test/inductor/test_ordered_set.py::TestJointOps::test_union, test/inductor/test_ordered_set.py::TestJointOps::test_uniquification, test/inductor/test_ordered_set.py::TestJointOps::test_xor, test/inductor/test_ordered_set.py::TestSet::test_add, test/inductor/test_ordered_set.py::TestSet::test_and, test/inductor/test_ordered_set.py::TestSet::test_badcmp, test/inductor/test_ordered_set.py::TestSet::test_clear, test/inductor/test_ordered_set.py::TestSet::test_constructor_identity, test/inductor/test_ordered_set.py::TestSet::test_container_iterator, test/inductor/test_ordered_set.py::TestSet::test_contains, test/inductor/test_ordered_set.py::TestSet::test_copy, test/inductor/test_ordered_set.py::TestSet::test_cyclical_repr, test/inductor/test_ordered_set.py::TestSet::test_deepcopy, test/inductor/test_ordered_set.py::TestSet::test_difference, test/inductor/test_ordered_set.py::TestSet::test_difference_update, test/inductor/test_ordered_set.py::TestSet::test_discard, test/inductor/test_ordered_set.py::TestSet::test_do_not_rehash_dict_keys, test/inductor/test_ordered_set.py::TestSet::test_equality, test/inductor/test_ordered_set.py::TestSet::test_free_after_iterating, test/inductor/test_ordered_set.py::TestSet::test_gc, test/inductor/test_ordered_set.py::TestSet::test_hash, test/inductor/test_ordered_set.py::TestSet::test_iand, test/inductor/test_ordered_set.py::TestSet::test_init, test/inductor/test_ordered_set.py::TestSet::test_inplace_on_self, test/inductor/test_ordered_set.py::TestSet::test_intersection, test/inductor/test_ordered_set.py::TestSet::test_intersection_update, test/inductor/test_ordered_set.py::TestSet::test_ior, test/inductor/test_ordered_set.py::TestSet::test_isdisjoint, test/inductor/test_ordered_set.py::TestSet::test_isub, test/inductor/test_ordered_set.py::TestSet::test_iterator_pickling, test/inductor/test_ordered_set.py::TestSet::test_ixor, test/inductor/test_ordered_set.py::TestSet::test_len, test/inductor/test_ordered_set.py::TestSet::test_new_or_init, test/inductor/test_ordered_set.py::TestSet::test_or, test/inductor/test_ordered_set.py::TestSet::test_pickling, test/inductor/test_ordered_set.py::TestSet::test_pop, test/inductor/test_ordered_set.py::TestSet::test_remove, test/inductor/test_ordered_set.py::TestSet::test_remove_keyerror_set, test/inductor/test_ordered_set.py::TestSet::test_remove_keyerror_unpacking, test/inductor/test_ordered_set.py::TestSet::test_rich_compare, test/inductor/test_ordered_set.py::TestSet::test_setOfFrozensets, test/inductor/test_ordered_set.py::TestSet::test_set_literal, test/inductor/test_ordered_set.py::TestSet::test_set_literal_evaluation_order, test/inductor/test_ordered_set.py::TestSet::test_set_literal_insertion_order, test/inductor/test_ordered_set.py::TestSet::test_sub, test/inductor/test_ordered_set.py::TestSet::test_sub_and_super, test/inductor/test_ordered_set.py::TestSet::test_subclass_with_custom_hash, test/inductor/test_ordered_set.py::TestSet::test_symmetric_difference, test/inductor/test_ordered_set.py::TestSet::test_symmetric_difference_update, test/inductor/test_ordered_set.py::TestSet::test_union, test/inductor/test_ordered_set.py::TestSet::test_uniquification, test/inductor/test_ordered_set.py::TestSet::test_update, test/inductor/test_ordered_set.py::TestSet::test_weakref, test/inductor/test_ordered_set.py::TestSet::test_xor, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_copy, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_empty_difference, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_empty_difference_rev, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_empty_intersection, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_empty_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_empty_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_empty_union, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_equivalent_equality, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_intersection_empty, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_isdisjoint_empty, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_issue_37219, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_iteration, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_length, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_pickling, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_repr, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_self_difference, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_self_equality, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_self_intersection, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_self_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_self_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_self_union, test/inductor/test_ordered_set.py::TestBasicOpsEmpty::test_union_empty, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_copy, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_empty_difference, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_empty_difference_rev, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_empty_intersection, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_empty_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_empty_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_empty_union, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_equivalent_equality, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_in, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_intersection_empty, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_isdisjoint_empty, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_issue_37219, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_iteration, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_length, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_not_in, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_pickling, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_repr, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_self_difference, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_self_equality, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_self_intersection, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_self_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_self_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_self_union, test/inductor/test_ordered_set.py::TestBasicOpsSingleton::test_union_empty, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_copy, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_empty_difference, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_empty_difference_rev, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_empty_intersection, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_empty_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_empty_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_empty_union, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_equivalent_equality, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_in, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_intersection_empty, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_isdisjoint_empty, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_issue_37219, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_iteration, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_length, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_not_in, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_pickling, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_repr, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_self_difference, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_self_equality, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_self_intersection, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_self_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_self_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_self_union, test/inductor/test_ordered_set.py::TestBasicOpsTuple::test_union_empty, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_copy, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_empty_difference, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_empty_difference_rev, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_empty_intersection, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_empty_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_empty_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_empty_union, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_equivalent_equality, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_intersection_empty, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_isdisjoint_empty, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_issue_37219, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_iteration, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_length, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_pickling, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_repr, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_self_difference, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_self_equality, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_self_intersection, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_self_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_self_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_self_union, test/inductor/test_ordered_set.py::TestBasicOpsTriple::test_union_empty, test/inductor/test_ordered_set.py::TestBasicOpsString::test_copy, test/inductor/test_ordered_set.py::TestBasicOpsString::test_empty_difference, test/inductor/test_ordered_set.py::TestBasicOpsString::test_empty_difference_rev, test/inductor/test_ordered_set.py::TestBasicOpsString::test_empty_intersection, test/inductor/test_ordered_set.py::TestBasicOpsString::test_empty_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsString::test_empty_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsString::test_empty_union, test/inductor/test_ordered_set.py::TestBasicOpsString::test_equivalent_equality, test/inductor/test_ordered_set.py::TestBasicOpsString::test_intersection_empty, test/inductor/test_ordered_set.py::TestBasicOpsString::test_isdisjoint_empty, test/inductor/test_ordered_set.py::TestBasicOpsString::test_issue_37219, test/inductor/test_ordered_set.py::TestBasicOpsString::test_iteration, test/inductor/test_ordered_set.py::TestBasicOpsString::test_length, test/inductor/test_ordered_set.py::TestBasicOpsString::test_pickling, test/inductor/test_ordered_set.py::TestBasicOpsString::test_repr, test/inductor/test_ordered_set.py::TestBasicOpsString::test_self_difference, test/inductor/test_ordered_set.py::TestBasicOpsString::test_self_equality, test/inductor/test_ordered_set.py::TestBasicOpsString::test_self_intersection, test/inductor/test_ordered_set.py::TestBasicOpsString::test_self_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsString::test_self_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsString::test_self_union, test/inductor/test_ordered_set.py::TestBasicOpsString::test_union_empty, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_copy, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_empty_difference, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_empty_difference_rev, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_empty_intersection, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_empty_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_empty_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_empty_union, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_equivalent_equality, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_intersection_empty, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_isdisjoint_empty, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_issue_37219, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_iteration, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_length, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_pickling, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_repr, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_self_difference, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_self_equality, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_self_intersection, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_self_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_self_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_self_union, test/inductor/test_ordered_set.py::TestBasicOpsBytes::test_union_empty, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_copy, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_empty_difference, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_empty_difference_rev, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_empty_intersection, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_empty_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_empty_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_empty_union, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_equivalent_equality, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_intersection_empty, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_isdisjoint_empty, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_issue_37219, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_iteration, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_length, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_pickling, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_repr, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_self_difference, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_self_equality, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_self_intersection, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_self_isdisjoint, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_self_symmetric_difference, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_self_union, test/inductor/test_ordered_set.py::TestBasicOpsMixedStringBytes::test_union_empty, test/inductor/test_ordered_set.py::TestExceptionPropagation::test_changingSizeWhileIterating, test/inductor/test_ordered_set.py::TestExceptionPropagation::test_instanceWithException, test/inductor/test_ordered_set.py::TestExceptionPropagation::test_instancesWithoutException, test/inductor/test_ordered_set.py::TestSetOfSets::test_constructor, test/inductor/test_ordered_set.py::TestBinaryOps::test_eq, test/inductor/test_ordered_set.py::TestBinaryOps::test_intersection_non_overlap, test/inductor/test_ordered_set.py::TestBinaryOps::test_intersection_overlap, test/inductor/test_ordered_set.py::TestBinaryOps::test_intersection_subset, test/inductor/test_ordered_set.py::TestBinaryOps::test_intersection_superset, test/inductor/test_ordered_set.py::TestBinaryOps::test_isdisjoint_non_overlap, test/inductor/test_ordered_set.py::TestBinaryOps::test_isdisjoint_overlap, test/inductor/test_ordered_set.py::TestBinaryOps::test_isdisjoint_subset, test/inductor/test_ordered_set.py::TestBinaryOps::test_isdisjoint_superset, test/inductor/test_ordered_set.py::TestBinaryOps::test_sym_difference_non_overlap, test/inductor/test_ordered_set.py::TestBinaryOps::test_sym_difference_overlap, test/inductor/test_ordered_set.py::TestBinaryOps::test_sym_difference_subset, test/inductor/test_ordered_set.py::TestBinaryOps::test_sym_difference_superset, test/inductor/test_ordered_set.py::TestBinaryOps::test_union_non_overlap, test/inductor/test_ordered_set.py::TestBinaryOps::test_union_overlap, test/inductor/test_ordered_set.py::TestBinaryOps::test_union_subset, test/inductor/test_ordered_set.py::TestBinaryOps::test_union_superset, test/inductor/test_ordered_set.py::TestUpdateOps::test_difference_method_call, test/inductor/test_ordered_set.py::TestUpdateOps::test_difference_non_overlap, test/inductor/test_ordered_set.py::TestUpdateOps::test_difference_overlap, test/inductor/test_ordered_set.py::TestUpdateOps::test_difference_subset, test/inductor/test_ordered_set.py::TestUpdateOps::test_difference_superset, test/inductor/test_ordered_set.py::TestUpdateOps::test_intersection_method_call, test/inductor/test_ordered_set.py::TestUpdateOps::test_intersection_non_overlap, test/inductor/test_ordered_set.py::TestUpdateOps::test_intersection_overlap, test/inductor/test_ordered_set.py::TestUpdateOps::test_intersection_subset, test/inductor/test_ordered_set.py::TestUpdateOps::test_intersection_superset, test/inductor/test_ordered_set.py::TestUpdateOps::test_sym_difference_method_call, test/inductor/test_ordered_set.py::TestUpdateOps::test_sym_difference_non_overlap, test/inductor/test_ordered_set.py::TestUpdateOps::test_sym_difference_overlap, test/inductor/test_ordered_set.py::TestUpdateOps::test_sym_difference_subset, test/inductor/test_ordered_set.py::TestUpdateOps::test_sym_difference_superset, test/inductor/test_ordered_set.py::TestUpdateOps::test_union_method_call, test/inductor/test_ordered_set.py::TestUpdateOps::test_union_non_overlap, test/inductor/test_ordered_set.py::TestUpdateOps::test_union_overlap, test/inductor/test_ordered_set.py::TestUpdateOps::test_union_subset, test/inductor/test_ordered_set.py::TestUpdateOps::test_union_superset, test/inductor/test_ordered_set.py::TestMutate::test_add_absent, test/inductor/test_ordered_set.py::TestMutate::test_add_present, test/inductor/test_ordered_set.py::TestMutate::test_add_until_full, test/inductor/test_ordered_set.py::TestMutate::test_clear, test/inductor/test_ordered_set.py::TestMutate::test_discard_absent, test/inductor/test_ordered_set.py::TestMutate::test_discard_present, test/inductor/test_ordered_set.py::TestMutate::test_pop, test/inductor/test_ordered_set.py::TestMutate::test_remove_absent, test/inductor/test_ordered_set.py::TestMutate::test_remove_present, test/inductor/test_ordered_set.py::TestMutate::test_remove_until_empty, test/inductor/test_ordered_set.py::TestMutate::test_update_empty_tuple, test/inductor/test_ordered_set.py::TestMutate::test_update_unit_tuple_non_overlap, test/inductor/test_ordered_set.py::TestMutate::test_update_unit_tuple_overlap, test/inductor/test_ordered_set.py::TestSubsets::test_issubset, test/inductor/test_ordered_set.py::TestSubsetEqualEmpty::test_issubset, test/inductor/test_ordered_set.py::TestSubsetEqualNonEmpty::test_issubset, test/inductor/test_ordered_set.py::TestSubsetEmptyNonEmpty::test_issubset, test/inductor/test_ordered_set.py::TestSubsetPartial::test_issubset, test/inductor/test_ordered_set.py::TestSubsetNonOverlap::test_issubset, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_difference, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_eq_ne, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_ge_gt_le_lt, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_intersection, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_intersection_update, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_intersection_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_sym_difference, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_sym_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_sym_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_union, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_update, test/inductor/test_ordered_set.py::TestOnlySetsNumeric::test_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_difference, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_eq_ne, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_ge_gt_le_lt, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_intersection, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_intersection_update, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_intersection_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_sym_difference, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_sym_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_sym_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_union, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_update, test/inductor/test_ordered_set.py::TestOnlySetsDict::test_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_difference, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_eq_ne, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_ge_gt_le_lt, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_intersection, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_intersection_update, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_intersection_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_sym_difference, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_sym_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_sym_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_union, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_update, test/inductor/test_ordered_set.py::TestOnlySetsOperator::test_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_difference, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_eq_ne, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_ge_gt_le_lt, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_intersection, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_intersection_update, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_intersection_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_sym_difference, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_sym_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_sym_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_union, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_update, test/inductor/test_ordered_set.py::TestOnlySetsTuple::test_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsString::test_difference, test/inductor/test_ordered_set.py::TestOnlySetsString::test_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsString::test_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsString::test_eq_ne, test/inductor/test_ordered_set.py::TestOnlySetsString::test_ge_gt_le_lt, test/inductor/test_ordered_set.py::TestOnlySetsString::test_intersection, test/inductor/test_ordered_set.py::TestOnlySetsString::test_intersection_update, test/inductor/test_ordered_set.py::TestOnlySetsString::test_intersection_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsString::test_sym_difference, test/inductor/test_ordered_set.py::TestOnlySetsString::test_sym_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsString::test_sym_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsString::test_union, test/inductor/test_ordered_set.py::TestOnlySetsString::test_update, test/inductor/test_ordered_set.py::TestOnlySetsString::test_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_difference, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_eq_ne, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_ge_gt_le_lt, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_intersection, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_intersection_update, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_intersection_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_sym_difference, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_sym_difference_update, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_sym_difference_update_operator, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_union, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_update, test/inductor/test_ordered_set.py::TestOnlySetsGenerator::test_update_operator, test/inductor/test_ordered_set.py::TestCopyingEmpty::test_copy, test/inductor/test_ordered_set.py::TestCopyingEmpty::test_deep_copy, test/inductor/test_ordered_set.py::TestCopyingSingleton::test_copy, test/inductor/test_ordered_set.py::TestCopyingSingleton::test_deep_copy, test/inductor/test_ordered_set.py::TestCopyingTriple::test_copy, test/inductor/test_ordered_set.py::TestCopyingTriple::test_deep_copy, test/inductor/test_ordered_set.py::TestCopyingTuple::test_copy, test/inductor/test_ordered_set.py::TestCopyingTuple::test_deep_copy, test/inductor/test_ordered_set.py::TestCopyingNested::test_copy, test/inductor/test_ordered_set.py::TestCopyingNested::test_deep_copy, test/inductor/test_ordered_set.py::TestIdentities::test_binopsVsSubsets, test/inductor/test_ordered_set.py::TestIdentities::test_commutativity, test/inductor/test_ordered_set.py::TestIdentities::test_exclusion, test/inductor/test_ordered_set.py::TestIdentities::test_summations, test/inductor/test_ordered_set.py::TestVariousIteratorArgs::test_constructor, test/inductor/test_ordered_set.py::TestVariousIteratorArgs::test_inline_methods, test/inductor/test_ordered_set.py::TestVariousIteratorArgs::test_inplace_methods, test/inductor/test_ordered_set.py::TestWeirdBugs::test_8420_set_merge, test/inductor/test_ordered_set.py::TestWeirdBugs::test_iter_and_mutate, test/inductor/test_ordered_set.py::TestWeirdBugs::test_merge_and_mutate, test/inductor/test_ordered_set.py::TestGraphs::test_cube, test/inductor/test_ordered_set.py::TestGraphs::test_cuboctahedron 2025-09-07T07:30:12.6732212Z 2025-09-07T07:30:12.6732410Z Running dynamo/test_recompiles 1/1 ... [2025-09-07 07:30:12.653619] 2025-09-07T07:30:12.6732914Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:12.6734012Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_recompiles.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:12.653933] 2025-09-07T07:30:14.2452670Z 2025-09-07T07:30:14.2454253Z dynamo/test_fx_graph_runnable 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fx_graph_runnable_1.1_604590c274a868a2_.log 2025-09-07T07:30:14.2460506Z Running 15 items in this shard: test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_all_gather_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_all_reduce_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_basic_tensor_add, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_broadcast_add_dynamic, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_broadcast_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_dtensor_compile_redistribute, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_dynamic_shapes_run, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_reduce_scatter_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_scalar_multiply, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_toy_model_basic, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_toy_model_batch_processing, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_toy_model_dynamic_batch, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_two_inputs_matmul, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_user_defined_triton_kernel, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_user_defined_triton_kernel_autotune 2025-09-07T07:30:14.2467425Z 2025-09-07T07:30:14.2467664Z Running test_per_overload_api 1/1 ... [2025-09-07 07:30:14.245206] 2025-09-07T07:30:14.2468143Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:14.2469344Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_per_overload_api.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:14.245606] 2025-09-07T07:30:16.6241011Z 2025-09-07T07:30:16.6242283Z dynamo/test_recompiles 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_recompiles_1.1_9e22ee916f4d08e0_.log 2025-09-07T07:30:16.6250282Z Running 18 items in this shard: test/dynamo/test_recompiles.py::RecompileTests::test_aliasing_guard_failures, test/dynamo/test_recompiles.py::RecompileTests::test_aliasing_guard_failures_with_globals, test/dynamo/test_recompiles.py::RecompileTests::test_ambient_autocast_recompile, test/dynamo/test_recompiles.py::RecompileTests::test_autocast_constant_fold, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_on_closed_ints, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_reduce_recompiles, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_shapes_mark_as_oblivious, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_shapes_mark_as_oblivious_fail_counterfactual, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_shapes_mark_as_unbacked, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_tensor_scalar_change, test/dynamo/test_recompiles.py::RecompileTests::test_dunder_call_recompile, test/dynamo/test_recompiles.py::RecompileTests::test_dynamic_shape_parameter_recompile, test/dynamo/test_recompiles.py::RecompileTests::test_inline_inbuilt_nn_modules_candidate, test/dynamo/test_recompiles.py::RecompileTests::test_no_recompile_over_unused_objects, test/dynamo/test_recompiles.py::RecompileTests::test_no_recursive_compile_after_cache_limit_hit, test/dynamo/test_recompiles.py::RecompileTests::test_recompiles_true_false_flop, test/dynamo/test_recompiles.py::RecompileTests::test_run_mode_after_cache_limit_hit, test/dynamo/test_recompiles.py::RecompileTests::test_simple_module_recompile 2025-09-07T07:30:16.6256973Z 2025-09-07T07:30:16.6257179Z Running inductor/test_xpu_basic 1/1 ... [2025-09-07 07:30:16.624001] 2025-09-07T07:30:16.6257591Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:16.6258623Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_xpu_basic.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:16.624308] 2025-09-07T07:30:17.8155513Z 2025-09-07T07:30:17.8156613Z test_per_overload_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_per_overload_api_1.1_fdaed0d23b8ca962_.log 2025-09-07T07:30:17.8158520Z Running 3 items in this shard: test/test_per_overload_api.py::TestPerOverloadAPI::test_basics_opoverload, test/test_per_overload_api.py::TestPerOverloadAPI::test_basics_opoverloadpacket, test/test_per_overload_api.py::TestPerOverloadAPI::test_decompose 2025-09-07T07:30:17.8159744Z 2025-09-07T07:30:17.8160003Z Running export/test_cpp_serdes 1/1 ... [2025-09-07 07:30:17.815580] 2025-09-07T07:30:17.8160552Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:17.8161680Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_cpp_serdes.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:17.815941] 2025-09-07T07:30:23.7494662Z 2025-09-07T07:30:23.7495851Z inductor/test_xpu_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_xpu_basic_1.1_d6a5c00d32c83398_.log 2025-09-07T07:30:23.7496617Z 2025-09-07T07:30:23.7496854Z Running inductor/test_utils 1/1 ... [2025-09-07 07:30:23.749428] 2025-09-07T07:30:23.7497297Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:23.7499699Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_utils.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:23.749736] 2025-09-07T07:30:26.7933749Z 2025-09-07T07:30:26.7935000Z export/test_cpp_serdes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_cpp_serdes_1.1_3a03f98b7a51fe44_.log 2025-09-07T07:30:26.8067542Z Running 407 items in this shard: test/export/test_cpp_serdes.py::CppSerdesTestExport::test__scaled_dot_product_flash_attention_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_additional_inputs_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_allow_explicit_guards_as_runtime_asserts_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_args_type_checked_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_aten_lift_fresh_copy_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_attention_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_attr_assignment_extra_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_constrain_size_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_dynamic_shapes_constant_relation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_dynamic_shapes_linear_relation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_dynamic_shapes_simple_equality_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_baddbmm_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_basic_non_strict_fake_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_basic_non_strict_real_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_bincount_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_buffer_util_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_capture_subclass_constructor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_capture_subclass_constructor_torch_ir_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_capture_subclass_wrong_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_ccode_python_mod_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_check_specialized_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_checks_to_constrain_range_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cleanup_dynamic_markers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_colin_unbacked_backed_vr_sub_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_colon_parameter_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_compiling_state_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_access_identical_symint_closure_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_branches_return_constant_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_branches_return_same_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_contains_unbacked_no_escape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_int_closure_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_with_module_stack_export_with_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_with_module_stack_export_with_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_aliasing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_input_naming_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_no_user_inp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_output_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_output_dup_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_requires_grad_const_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_return_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_tensor_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_tensor_with_non_functional_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_tensor_with_non_functional_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_decomp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_size_in_eager_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_size_with_constrain_value_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_size_with_various_cases_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_conv_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_crop_like_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cse_for_symint_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_auto_functionalize_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_auto_functionalize_pre_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_auto_warn_pre_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_preserve_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_pytree_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_tag_metadata_re_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_decomp_batch_norm_functional_predispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_decomp_item_in_prim_after_decomposition_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_decomp_item_in_prim_before_decomposition_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_default_decomposition_core_cia_ops_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_1_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_integer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_repeat_derived_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_simplified_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_repeat_derived_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_detect_leak_nonstrict_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_detect_leak_nonstrict_with_stacktrace_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_detect_leak_strict_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_gpu_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_mutation_float_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_static_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_1_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_auto_and_dim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_dynamic_divisibility_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_dynamic_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_hint_range_violations_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_hint_ranges_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_disable_forced_specializations_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_disable_forced_specializations_ok_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_gather_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_gather_into_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_reduce_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_to_all_single_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_reduce_scatter_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dont_duck_size_for_auto_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_double_lifted_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_aliasing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_mutation_list_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_mutation_with_nan_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_fake_kernel_inference_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_infers_fake_kernel_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_duplicate_modules_with_non_persistent_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_lr_shift_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_bounds_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_builder_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_builder_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_builder_pytree_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_dataclass_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_inferred_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_serdes_generic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_serdes_user_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_serdes_various_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_spec_with_pytree_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_sym_round_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_ends_of_bounds_oblivious_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_error_does_not_reference_eager_fallback_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_error_when_passing_mutating_primitive_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_exception_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_expand_copy_export_handles_implicit_true_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_api_with_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_as_backend_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_associative_scan_lifted_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_associative_scan_symbol_dim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_associative_scan_symbol_scandim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_aten_to_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_aten_to_unflatten_subclass_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cond_symbool_pred_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cond_warns_constant_pred_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_decomp_table_basic_pop_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_decomp_table_container_methods_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_op_lib_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_triton_kernel_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_triton_kernel_mutable_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cyclic_reference_leak_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomp_torture_case_1_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomp_torture_case_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomps_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomps_simple_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_dynamo_config_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_run_decomp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_container_type_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_state_dict_hooks_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_default_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_keyword_only_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_pytree_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_var_keyword_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_var_keyword_pytree_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_var_postional_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_function_schema_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_graph_with_no_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_input_mutation_bug_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_input_mutation_dynamic_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_input_mutation_static_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_linear_preserve_dynamic_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_max_nonstrict_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_max_onnx_reported_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_mod_constraints_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_preserve_linear_at_aot_level_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_preserve_linear_but_not_custom_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_scan_pytree_output_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_script_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_statically_known_true_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_then_compile_tensor_ctor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_autocast_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_fake_tensor_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_inline_constraints_complex_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_inline_constraints_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_set_grad_enabled_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_wrong_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_external_call_non_strict_real_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_fake_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_fake_weights_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_filter_traceback_frames_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_float_conversion_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_float_conversion_from_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_fqn_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_from_node_metadata_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_full_on_scalar_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_function_holding_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_hints_wrapper_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_hoo_inline_users_issue_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_if_functional_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_if_post_autograd_op_preserved_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_class_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_class_method_recursive_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_function_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_int_shape_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_intermediate_shape_comp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_is_exporting_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_is_non_negative_check_function_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_is_nonzero_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_isnonzero_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_issue_113041_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_issue_157289_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_istft_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_keep_composite_ops_invalid_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_keep_composite_ops_linear_convd_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_keep_composite_ops_linear_convd_for_training_ir_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_kwarg_dynamic_shapes_diff_order_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_kwargs_reorder_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_layer_norm_unbacked_normalized_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_layer_sharing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_lazy_module_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_lifted_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_linear_conv_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_malformed_fqn_from_source_name_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_map_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_map_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_mask_nonzero_static_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_masked_select_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_math_pow_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_mismatched_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_mixed_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_dict_key_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_input_subclasses_parameterization_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_list_slice_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_with_dict_container_inp_out_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_modules_access_for_deleted_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_more_multidimensional_slicing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_multidimensional_slicing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_multinomial_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_multiple_definitions_same_name_dim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_dynamic_shapes_spec_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_with_constant_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_with_init_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_with_parameter_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nn_module_stack_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nn_module_stack_shared_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_check_is_size_error_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_suggested_fixes_for_data_dependent_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_3_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_arg_name_dynamic_shapes_api_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_persistent_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_strict_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_strict_dynamic_shapes_suggested_fixes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_none_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nonstrict_retrace_preserves_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nonzero_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nonzero_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_not_registered_parameter_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_operator_aten_tensor_mode_variant_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_output_node_name_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_pad_sequence_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_param_util_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_partial_patched_forward_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_collisions_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_collisions_hoo_subgraphs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_order_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_order_variadic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_update_preserving_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_predispatch_cond_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_predispatch_grad_wrappers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_preserve_module_call_signature_unflatten_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_preserve_requires_grad_placeholders_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_preserve_shape_dynamism_for_unused_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_profiling_code_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_python_asserts_with_sym_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_pytree_register_data_class_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_pytree_register_nested_data_class_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_range_constraints_with_replacement_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_alias_dtype_mismatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_bool_cast_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_errors_on_aliasing_custom_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_for_max_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_size_mismatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_redundant_assert_max_upper_bound_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_redundant_asserts_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_refine_dynamic_shapes_from_suggested_fixes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_register_constant_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_repeat_interleave_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_replace_unbacked_with_very_large_upperbound_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_replaced_unbacked_bindings_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_reshape_view_helper_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_retracable_ep_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_retrace_pre_autograd_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_run_decomposition_supports_user_input_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_run_decompositions_keep_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_run_decompositions_keep_tensor_constant_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_runtime_assert_for_prim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_runtime_assert_for_prm_str_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_runtime_assert_with_size_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sdpa_gqa_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sequential_slicing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_example_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_grad_as_side_effect_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_grad_empty_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_grad_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_setgrad_lifted_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_shared_submodule_nn_module_stack_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_simple_export_for_training_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_simple_unbacked_view_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_size_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_slice_nn_module_stack_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_solver_unsupported_sympy_function_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_specialize_derived_dim_roots_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_split_const_gm_with_lifted_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_stack_trace_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_stack_trace_make_fx_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_state_primitives_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_state_shape_attribute_assignment_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_state_tensors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_static_dim_constraints_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_complicated_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_const_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclasses_parameterization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclasses_parameterization_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggest_torch_checks_with_non_negative_check_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggest_torch_checks_with_regular_check_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggested_fixes_for_data_dependent_errors_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggested_fixes_new_roots_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sym_float_operators_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sym_or_sym_and_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sym_sqrt_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symbool_item_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symfloat_item_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_additional_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_ranges_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_shapes_collection_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_item_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_output_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_tensor_return_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tensor_attribute_zero_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tensor_constant_aten_to_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tensor_constant_with_wrapped_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_to_module_with_mutated_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_to_module_with_mutated_buffer_multiple_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tolist_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_torch_check_eq_commutativity_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_torch_fn_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_trace_under_fake_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_train_eval_on_exported_preautograd_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_3d_matmul_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_bincount_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_bindings_for_divisible_u_symint_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_deferred_runtime_retrace_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_expand_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_infer_size_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_kth_value_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_linear_layer_norm_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_noncontig_lin_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_pad_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_scalar_constructor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_slice_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_to_cond_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_to_cond_passthrough_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_unsqueeze_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_asserts_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_buffer_update_child2parent_swap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_closure_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_isinstance_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_shared_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_state_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_no_unroll_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_placeholder_update_child2parent_swap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_5_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_6_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_buf_8_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_const_preserving_3_1_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_const_preserving_3_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_6_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_9_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_preserving_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unused_aliases_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unused_constant_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_use_embedding_twice_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_user_input_and_buffer_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_vmap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_assert_separation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_index_assertions_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_simple_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_tensor_constant_idx_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_wrapper_module_cpp_serdes 2025-09-07T07:30:26.8205845Z 2025-09-07T07:30:26.8206046Z Running inductor/test_cuda_repro 1/1 ... [2025-09-07 07:30:26.794029] 2025-09-07T07:30:26.8206430Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:26.8207383Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_repro.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:26.794359] 2025-09-07T07:30:27.8702540Z 2025-09-07T07:30:27.8703517Z inductor/test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_utils_1.1_1afafcfb395d5571_.log 2025-09-07T07:30:27.8706156Z Running 7 items in this shard: test/inductor/test_utils.py::TestUtilsCUDA::testSympySubs_cuda, test/inductor/test_utils.py::TestUtilsCUDA::test_flops_fx_cuda, test/inductor/test_utils.py::TestUtilsCUDA::test_get_device_tflops_cuda_bfloat16, test/inductor/test_utils.py::TestUtilsCUDA::test_get_device_tflops_cuda_float16, test/inductor/test_utils.py::TestUtilsCUDA::test_get_device_tflops_cuda_float32, test/inductor/test_utils.py::TestUtilsCUDA::test_sympy_str_cuda, test/inductor/test_utils.py::TestUtilsCUDA::test_zip_schema_cuda 2025-09-07T07:30:27.8708216Z 2025-09-07T07:30:27.8708385Z Running test_pytree 1/1 ... [2025-09-07 07:30:27.870281] 2025-09-07T07:30:27.8708743Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:27.8710028Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_pytree.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:27.870586] 2025-09-07T07:30:31.7407531Z 2025-09-07T07:30:31.7408606Z test_pytree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_pytree_1.1_f1b6041a2f289249_.log 2025-09-07T07:30:31.7441159Z Running 98 items in this shard: test/test_pytree.py::TestGenericPytree::test_aligned_public_apis, test/test_pytree.py::TestGenericPytree::test_broadcast_to_and_flatten_cxx, test/test_pytree.py::TestGenericPytree::test_broadcast_to_and_flatten_python, test/test_pytree.py::TestGenericPytree::test_enum_treespec_roundtrip_cxx, test/test_pytree.py::TestGenericPytree::test_enum_treespec_roundtrip_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_defaultdict_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_defaultdict_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_deque_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_deque_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_dict_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_dict_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_leaf_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_leaf_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_list_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_list_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_namedtuple_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_namedtuple_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_nested_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_nested_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_ordereddict_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_ordereddict_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_max_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_max_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_min_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_min_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_tuple_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_tuple_python, test/test_pytree.py::TestGenericPytree::test_flatten_with_is_leaf_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_with_is_leaf_python, test/test_pytree.py::TestGenericPytree::test_is_namedtuple_cxx, test/test_pytree.py::TestGenericPytree::test_is_namedtuple_python, test/test_pytree.py::TestGenericPytree::test_is_structseq_cxx, test/test_pytree.py::TestGenericPytree::test_is_structseq_python, test/test_pytree.py::TestGenericPytree::test_pytree_serialize_bad_input_cxx, test/test_pytree.py::TestGenericPytree::test_pytree_serialize_bad_input_python, test/test_pytree.py::TestGenericPytree::test_register_pytree_node_cxx, test/test_pytree.py::TestGenericPytree::test_register_pytree_node_python, test/test_pytree.py::TestGenericPytree::test_tree_all_any_cxx, test/test_pytree.py::TestGenericPytree::test_tree_all_any_python, test/test_pytree.py::TestGenericPytree::test_tree_map_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_multi_inputs_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_multi_inputs_python, test/test_pytree.py::TestGenericPytree::test_tree_map_only_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_only_predicate_fn_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_only_predicate_fn_python, test/test_pytree.py::TestGenericPytree::test_tree_map_only_python, test/test_pytree.py::TestGenericPytree::test_tree_map_python, test/test_pytree.py::TestPythonPytree::test_constant, test/test_pytree.py::TestPythonPytree::test_constant_default_eq_error, test/test_pytree.py::TestPythonPytree::test_constant_default_hash_error, test/test_pytree.py::TestPythonPytree::test_dataclass, test/test_pytree.py::TestPythonPytree::test_deprecated_register_pytree_node, test/test_pytree.py::TestPythonPytree::test_flatten_flatten_with_key_consistency, test/test_pytree.py::TestPythonPytree::test_import_pytree_doesnt_import_optree, test/test_pytree.py::TestPythonPytree::test_key_access, test/test_pytree.py::TestPythonPytree::test_key_str, test/test_pytree.py::TestPythonPytree::test_pytree_context_serialize_bad, test/test_pytree.py::TestPythonPytree::test_pytree_custom_type_serialize, test/test_pytree.py::TestPythonPytree::test_pytree_custom_type_serialize_bad, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_bad_protocol, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_defaultdict_enum, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_enum, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_namedtuple, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_namedtuple_bad, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_register_bad, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec0, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec1, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec2, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec3, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec4, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec5, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec6, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec7, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec8, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec9, test/test_pytree.py::TestPythonPytree::test_register_dataclass_class, test/test_pytree.py::TestPythonPytree::test_saved_serialized, test/test_pytree.py::TestPythonPytree::test_tree_flatten_with_path_is_leaf, test/test_pytree.py::TestPythonPytree::test_tree_flatten_with_path_roundtrip, test/test_pytree.py::TestPythonPytree::test_tree_leaves_with_path, test/test_pytree.py::TestPythonPytree::test_tree_map_with_path, test/test_pytree.py::TestPythonPytree::test_tree_map_with_path_multiple_trees, test/test_pytree.py::TestPythonPytree::test_treespec_equality, test/test_pytree.py::TestPythonPytree::test_treespec_repr, test/test_pytree.py::TestCxxPytree::test_pytree_custom_type_serialize, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_namedtuple, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec0, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec1, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec2, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec3, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec4, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec5, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec6, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec7, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec8, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec9, test/test_pytree.py::TestCxxPytree::test_treespec_equality, test/test_pytree.py::TestCxxPytree::test_treespec_repr 2025-09-07T07:30:31.7463343Z 2025-09-07T07:30:31.7463519Z Running inductor/test_fp8 1/1 ... [2025-09-07 07:30:31.740860] 2025-09-07T07:30:31.7463872Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:31.7464760Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fp8.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:31.741240] 2025-09-07T07:30:34.6203883Z 2025-09-07T07:30:34.6204968Z inductor/test_cuda_repro 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cuda_repro_1.1_31fb43573808a43e_.log 2025-09-07T07:30:34.6227700Z Running 78 items in this shard: test/inductor/test_cuda_repro.py::CudaReproTests::test_3d_tiling, test/inductor/test_cuda_repro.py::CudaReproTests::test_accuracy_issue1, test/inductor/test_cuda_repro.py::CudaReproTests::test_adaptive_avg_pool3d_issue_157248, test/inductor/test_cuda_repro.py::CudaReproTests::test_atomic_add_bfloat16, test/inductor/test_cuda_repro.py::CudaReproTests::test_atomic_add_bfloat16_config, test/inductor/test_cuda_repro.py::CudaReproTests::test_autotune_inplace_kernel, test/inductor/test_cuda_repro.py::CudaReproTests::test_backward_context, test/inductor/test_cuda_repro.py::CudaReproTests::test_bool_emulate_low_precision, test/inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_dynamic_dense, test/inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_epilogue, test/inductor/test_cuda_repro.py::CudaReproTests::test_cat_int8_one_kernel, test/inductor/test_cuda_repro.py::CudaReproTests::test_cpu_index, test/inductor/test_cuda_repro.py::CudaReproTests::test_deterministic_algorithms, test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses, test/inductor/test_cuda_repro.py::CudaReproTests::test_dtype_factory_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_persistent_reductions, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_shapes, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_to_static_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding, test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned, test/inductor/test_cuda_repro.py::CudaReproTests::test_embedding_var_mean, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_low_precision, test/inductor/test_cuda_repro.py::CudaReproTests::test_epilogue_fusion_with_view, test/inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs_no_size_asserts, test/inductor/test_cuda_repro.py::CudaReproTests::test_flash_attention_dynamic, test/inductor/test_cuda_repro.py::CudaReproTests::test_float64_constants, test/inductor/test_cuda_repro.py::CudaReproTests::test_float8_e8m0fnu, test/inductor/test_cuda_repro.py::CudaReproTests::test_full_copy, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_add_fallback, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_inplace_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_no_fallback_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_indirect_indexing_dense_mask, test/inductor/test_cuda_repro.py::CudaReproTests::test_inductor_output_aliases_intermediate, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_add_alpha_autotune, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_buffer_autotune, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_updates_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_input_channels_last, test/inductor/test_cuda_repro.py::CudaReproTests::test_int64_index_intermediate, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue100806, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue103461, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue103481, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue104759, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_1input, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_2input, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue_103924, test/inductor/test_cuda_repro.py::CudaReproTests::test_libdevice_routing, test/inductor/test_cuda_repro.py::CudaReproTests::test_linear_cpu_input, test/inductor/test_cuda_repro.py::CudaReproTests::test_linear_with_zero_infeature_size, test/inductor/test_cuda_repro.py::CudaReproTests::test_lookup_seed_backward, test/inductor/test_cuda_repro.py::CudaReproTests::test_max_autotune_nograd, test/inductor/test_cuda_repro.py::CudaReproTests::test_memory_history_inductor, test/inductor/test_cuda_repro.py::CudaReproTests::test_multi_output_layout_fallback, test/inductor/test_cuda_repro.py::CudaReproTests::test_mutated_aligned_tensor, test/inductor/test_cuda_repro.py::CudaReproTests::test_negative_arange_dynamic_shapes, test/inductor/test_cuda_repro.py::CudaReproTests::test_no_device_idx_repro_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_non_commutative_scan_op, test/inductor/test_cuda_repro.py::CudaReproTests::test_non_contiguous_unaligned_input_indices, test/inductor/test_cuda_repro.py::CudaReproTests::test_not_initializing_wrong_device, test/inductor/test_cuda_repro.py::CudaReproTests::test_permute_fusion, test/inductor/test_cuda_repro.py::CudaReproTests::test_reflection_pad_loop_order, test/inductor/test_cuda_repro.py::CudaReproTests::test_repeated_masked_load, test/inductor/test_cuda_repro.py::CudaReproTests::test_scalar_triton_index, test/inductor/test_cuda_repro.py::CudaReproTests::test_scaled_dot_product_efficient_attention_backward, test/inductor/test_cuda_repro.py::CudaReproTests::test_scatter_index_not_wrapped, test/inductor/test_cuda_repro.py::CudaReproTests::test_selecsls42b_misaligned_address, test/inductor/test_cuda_repro.py::CudaReproTests::test_simplify_dims, test/inductor/test_cuda_repro.py::CudaReproTests::test_sort_stride_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_sorted_masks, test/inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_channels_last, test/inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_transposed, test/inductor/test_cuda_repro.py::CudaReproTests::test_triton_interpret, test/inductor/test_cuda_repro.py::CudaReproTests::test_uint_view_copy, test/inductor/test_cuda_repro.py::CudaReproTests::test_unspec_inputs_interop, test/inductor/test_cuda_repro.py::CudaReproTests::test_unused_cpu_input_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_xlnet_lm_stride_repro 2025-09-07T07:30:34.6247599Z 2025-09-07T07:30:34.6247812Z Running dynamo/test_nested_graph_breaks 1/1 ... [2025-09-07 07:30:34.620443] 2025-09-07T07:30:34.6248201Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:34.6249136Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_nested_graph_breaks.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:34.620831] 2025-09-07T07:30:38.4413545Z 2025-09-07T07:30:38.4414783Z dynamo/test_nested_graph_breaks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_nested_graph_breaks_1.1_318c83583db76d4d_.log 2025-09-07T07:30:38.4421039Z Running 14 items in this shard: test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_cells, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_differing_arg_nums, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_differing_locals_nums, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_doubly_nested_graph_break, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_inactive_ctx_manager, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_nested_graph_break_in_loop, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_nested_graph_break_in_try_block, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_no_recompiles, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_side_effects_cells, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_side_effects_globals, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_side_effects_globals_different_module, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_single_graph_break, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_single_graph_break_repeat, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_supported_ctx_manager 2025-09-07T07:30:38.4426630Z 2025-09-07T07:30:38.4426875Z Running dynamo/test_pre_dispatch 1/1 ... [2025-09-07 07:30:38.441329] 2025-09-07T07:30:38.4427296Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:38.4428363Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_pre_dispatch.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:38.441668] 2025-09-07T07:30:39.1170228Z 2025-09-07T07:30:39.1171328Z inductor/test_fp8 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fp8_1.1_6c7ea271f4f85078_.log 2025-09-07T07:30:39.1267576Z Running 240 items in this shard: test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_bad_cast, test/inductor/test_fp8.py::TestFP8Types::test_eager_fallback_bfloat16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_eager_fallback_bfloat16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_eager_fallback_float16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_eager_fallback_float16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float16_shape_15,3,13_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float16_shape_15,3,13_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float16_shape_4,2048,4096_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float16_shape_4,2048,4096_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float32_shape_15,3,13_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float32_shape_15,3,13_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float32_shape_4,2048,4096_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float32_shape_4,2048,4096_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_xblock_for_small_numel_float8_e4m3fn_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_xblock_for_small_numel_float8_e4m3fn_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_xblock_for_small_numel_float8_e5m2_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_xblock_for_small_numel_float8_e5m2_device_cuda, test/inductor/test_fp8.py::TestFP8Lowering::test_mx_fp8_max_autotune, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True, test/inductor/test_fp8.py::TestFP8Lowering::test_unacceptable_input_dims, test/inductor/test_fp8.py::TestFP8Lowering::test_unacceptable_scale_dims_rowwise_scaling 2025-09-07T07:30:39.1356506Z 2025-09-07T07:30:39.1356878Z Running dynamo/test_fx_passes_pre_grad 1/1 ... [2025-09-07 07:30:39.117565] 2025-09-07T07:30:39.1357400Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:39.1358515Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fx_passes_pre_grad.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:39.117908] 2025-09-07T07:30:42.1623973Z 2025-09-07T07:30:42.1625358Z dynamo/test_pre_dispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_pre_dispatch_1.1_0901701d250dc130_.log 2025-09-07T07:30:42.1627063Z Running 3 items in this shard: test/dynamo/test_pre_dispatch.py::PreDispatchTests::test_autocast_simple, test/dynamo/test_pre_dispatch.py::PreDispatchTests::test_enable_grad_and_no_grad, test/dynamo/test_pre_dispatch.py::PreDispatchTests::test_no_grad_simple 2025-09-07T07:30:42.1628149Z 2025-09-07T07:30:42.1628465Z Running inductor/test_combo_kernels 1/1 ... [2025-09-07 07:30:42.162382] 2025-09-07T07:30:42.1629092Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:42.1630257Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_combo_kernels.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:42.162703] 2025-09-07T07:30:42.7385246Z 2025-09-07T07:30:42.7386602Z dynamo/test_fx_passes_pre_grad 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fx_passes_pre_grad_1.1_4adc251676ec5a24_.log 2025-09-07T07:30:42.7388682Z Running 1 items in this shard: test/dynamo/test_fx_passes_pre_grad.py::FxPassesPreGradTests::test_pass_execution_and_save 2025-09-07T07:30:42.7389630Z 2025-09-07T07:30:42.7390220Z Running inductor/test_gpu_cpp_wrapper 1/1 ... [2025-09-07 07:30:42.738608] 2025-09-07T07:30:42.7391210Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:42.7392987Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_gpu_cpp_wrapper.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:42.738968] 2025-09-07T07:30:49.4376135Z 2025-09-07T07:30:49.4377368Z inductor/test_combo_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_combo_kernels_1.1_aaec7fbea490c061_.log 2025-09-07T07:30:49.4388011Z Running 20 items in this shard: test/inductor/test_combo_kernels.py::ComboKernelTests::test_2d_blocking_partitioning, test/inductor/test_combo_kernels.py::ComboKernelTests::test_activation_functions, test/inductor/test_combo_kernels.py::ComboKernelTests::test_mutated_args, test/inductor/test_combo_kernels.py::ComboKernelTests::test_reduce_functions, test/inductor/test_combo_kernels.py::ComboKernelTests::test_reduce_split, test/inductor/test_combo_kernels.py::ComboKernelBenchmarkTests::test_2d_blocking_benchmark, test/inductor/test_combo_kernels.py::ComboKernelBenchmarkTests::test_activation_benchmark, test/inductor/test_combo_kernels.py::ComboKernelBenchmarkTests::test_mutated_benchmark, test/inductor/test_combo_kernels.py::ComboKernelBenchmarkTests::test_persistent_reduction_no_x_dim, test/inductor/test_combo_kernels.py::ComboKernelBenchmarkTests::test_reduce_benchmark, test/inductor/test_combo_kernels.py::ComboKernelBenchmarkTests::test_round_robin_dispatch, test/inductor/test_combo_kernels.py::ComboKernelDynamicShapesTests::test_dynamic_shapes_2d_blocking, test/inductor/test_combo_kernels.py::ComboKernelDynamicShapesTests::test_dynamic_shapes_2d_blocking_round_robin, test/inductor/test_combo_kernels.py::ComboKernelDynamicShapesTests::test_dynamic_shapes_activations, test/inductor/test_combo_kernels.py::ComboKernelDynamicShapesTests::test_dynamic_shapes_activations_no_autotune, test/inductor/test_combo_kernels.py::ComboKernelDynamicShapesTests::test_dynamic_shapes_mutated, test/inductor/test_combo_kernels.py::ComboKernelDynamicShapesTests::test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda, test/inductor/test_combo_kernels.py::ComboKernelDynamicShapesTests::test_dynamic_shapes_persistent_reduction_no_x_dim, test/inductor/test_combo_kernels.py::ComboKernelDynamicShapesTests::test_dynamic_shapes_persistent_reduction_no_x_dim_2, test/inductor/test_combo_kernels.py::ComboKernelDynamicShapesTests::test_dynamic_shapes_reduce 2025-09-07T07:30:49.4395538Z 2025-09-07T07:30:49.4395806Z Running inductor/test_device_assert 1/1 ... [2025-09-07 07:30:49.437491] 2025-09-07T07:30:49.4396406Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:49.4397464Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_device_assert.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:49.437866] 2025-09-07T07:30:52.4682151Z 2025-09-07T07:30:52.4683542Z inductor/test_gpu_cpp_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_gpu_cpp_wrapper_1.1_064dcf1c16060a29_.log 2025-09-07T07:30:52.4797699Z Running 294 items in this shard: test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_add_complex4_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_add_complex_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_adding_tensor_offsets_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_addmm_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_aoti_debug_printer_works_on_constants, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_as_strided_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_batch_norm_2d_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bernoulli1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bitwise_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bmm1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bmm2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_buffer_use_after_remove_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_cat_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_cat_slice_cat_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_consecutive_split_cumprod_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_conv_backward_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_convolution1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_3_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_fusion_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_embedding_bag_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_enable_dynamic_shapes_cpp_wrapper_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_fft_real_input_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_fft_real_input_real_output_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_foreach_cpp_wrapper_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_index_put_deterministic_fallback_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_index_tensor_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_inductor_layout_optimization_input_mutations_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_insignificant_strides_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_layer_norm_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear_relu_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm2_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm3_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_views_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_multi_device_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_multi_threading_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pointwise_hermite_polynomial_h_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pointwise_hermite_polynomial_he_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pow3_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_profiler_mark_wrapper_call_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_randint_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_reduction1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_relu_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_repeat_interleave_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_roi_align_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scalar_input_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scaled_dot_product_attention_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scaled_dot_product_efficient_attention_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_silu_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sort_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sum_dtype_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sum_int_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_transpose_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_add_complex4_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_add_complex_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_adding_tensor_offsets_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_addmm_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_annotation_training, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_as_strided_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_batch_norm_2d_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bernoulli1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bitwise_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bmm1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bmm2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_buffer_use_after_remove_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_cat_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_cat_slice_cat_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_consecutive_split_cumprod_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_conv_backward_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_convolution1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_3_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_fusion_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_embedding_bag_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_enable_dynamic_shapes_cpp_wrapper_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_fft_real_input_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_fft_real_input_real_output_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_foreach_cpp_wrapper_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_index_put_deterministic_fallback_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_index_tensor_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_inductor_layout_optimization_input_mutations_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_insignificant_strides_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_layer_norm_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear_relu_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_plus_mm2_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_plus_mm3_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_views_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_multi_device_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_multi_threading_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pointwise_hermite_polynomial_h_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pointwise_hermite_polynomial_he_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pow3_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_profiler_mark_wrapper_call_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_randint_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_reduction1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_relu_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_repeat_interleave_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_roi_align_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scalar_input_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scaled_dot_product_attention_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scaled_dot_product_efficient_attention_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_silu_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sort_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sum_dtype_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sum_int_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_transpose_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_uint8_cuda_dynamic_shapes_gpu_wrapper 2025-09-07T07:30:52.4907317Z 2025-09-07T07:30:52.4907577Z Running inductor/test_op_completeness 1/1 ... [2025-09-07 07:30:52.468777] 2025-09-07T07:30:52.4908223Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:52.4909385Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_op_completeness.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:52.469152] 2025-09-07T07:30:56.3125453Z 2025-09-07T07:30:56.3126840Z inductor/test_device_assert 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_device_assert_1.1_083bf936287a6baf_.log 2025-09-07T07:30:56.3130278Z Running 4 items in this shard: test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_fusion, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_not_throw, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_throw, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_run_assert_triton 2025-09-07T07:30:56.3132646Z 2025-09-07T07:30:56.3132956Z Running export/test_tools 1/1 ... [2025-09-07 07:30:56.312575] 2025-09-07T07:30:56.3133679Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:56.3135290Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tools.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:56.312977] 2025-09-07T07:30:56.5398463Z 2025-09-07T07:30:56.5399489Z inductor/test_op_completeness 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_op_completeness_1.1_d3ddb6c4765a6f47_.log 2025-09-07T07:30:56.5402458Z Running 5 items in this shard: test/inductor/test_op_completeness.py::TestOpCompleteness::test_cpp_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_cpp_vec_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_halide_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_metal_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_triton_overrides 2025-09-07T07:30:56.5404306Z 2025-09-07T07:30:56.5404589Z Running dynamo/test_subgraphs 1/1 ... [2025-09-07 07:30:56.539958] 2025-09-07T07:30:56.5405050Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:56.5406331Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_subgraphs.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:56.540405] 2025-09-07T07:30:59.3171409Z 2025-09-07T07:30:59.3173371Z inductor/test_aot_inductor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_1.1_ba53739d6e4ac528_.log 2025-09-07T07:30:59.3504537Z Running 867 items in this shard: test/inductor/test_aot_inductor.py::AOTInductorLoggingTest::test_shape_env_reuse, test/inductor/test_aot_inductor.py::AOTInductorLoggingTest::test_shape_env_reuse_zero_consts_use_consts_asm_false, test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_explicit_set, test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_package_cpp_false_raises, test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_sets_package_cpp, test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_no_compile_standalone, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__int_mm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_add_complex_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_addmm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_addmm_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aliased_buffer_reuse_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_amp_fallback_random_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aot_inductor_consts_cpp_build_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_name_collision_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_cpp_kernel_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_fp8_dtype_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_sym_inputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_user_defined_triton_kernel_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printing_model_inputs_codegen_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_runtime_asserts_backed_symint_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_runtime_asserts_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_assert_async_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_assert_tensor_meta_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotune_int64_user_defined_triton_kernel_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotune_with_constant_folding_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotuning_args_reuse_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_backward_no_op_logging_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_bmm_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_bool_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_boolean_indexing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_3_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_4_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_and_force_mmap_weights_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_reuse_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_clamp_decomposition_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_composed_dynamic_size_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_nested_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_share_predicte_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_unbacked_symint_closure_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_unbacked_symint_closure_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_use_buffers_from_outer_scope_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_multiple_outputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_outer_code_before_after_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_parameters_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_reinterpret_view_inputs_outputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_consecutive_compiles_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_folding_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_folding_with_update_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_original_fqn_and_dtype_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_type_propagation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_conv3d_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_conv_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_convolution_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_copy_non_blocking_is_pinned_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_custom_op_in_subgraph_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_d2h_copy_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_deconv_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dup_unbacked_sym_decl_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dup_unbacked_sym_decl_with_refinement_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_duplicate_constant_folding_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_duplicated_params_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dynamic_cat_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dynamic_scalar_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dynamic_smem_above_default_limit_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_embedding_bag_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_empty_cat_dtype_promotion_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_empty_constant_folding_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_empty_graph_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_extract_constants_map_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fake_tensor_device_validation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fallback_kernel_with_symexpr_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fallback_mem_leak_fix_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fft_c2c_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fill__fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_foreach_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fp8_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fp8_view_of_param_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fqn_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_free_inactive_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fx_gm_return_tuple_validation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_index_put_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_index_put_with_none_index_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_inf_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_input_codegen_with_sympy_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_int_list_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_issue_140766_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_dynamic_dim_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_grid_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_linear_dynamic_maxautotune_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_linear_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_load_package_multiple_gpus_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_masked_select_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misaligned_input_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misaligned_input_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misc_1_max_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misc_1_max_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_missing_cubin_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_missing_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_model_modified_weights_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_multi_device_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_multiple_output_alias_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_nan_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_narrow_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_nested_tensor_from_jagged_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_no_args_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_contiguous_output_alias_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_default_gpu_device_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_tensor_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_none_args_aot_codegen_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_normal_functional_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_on_gpu_device1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_misaligned_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_path_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_path_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_non_zero_memory_leak_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_poi_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_profile_benchmark_harness_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_abs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_hann_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_permute_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_squeeze_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pytree_inputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quanatized_int8_linear_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quantized_linear_bias_none_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quantized_linear_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_interleave_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_calling_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replicate_on_devices_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_view_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_reuse_kernel_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_reuse_kernel_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_rocm_triton_autotuning_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_run_with_grad_enabled_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_complex_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_device_type_failed_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_dtype_failed_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_fp8_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_large_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_shape_failed_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_same_backing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scaled_dot_product_efficient_attention_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scatter_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scatter_reduce_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sdpa_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sdpa_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_seq_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_shifted_constraint_ranges_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_embed_kernel_binary_False_max_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_embed_kernel_binary_False_max_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_embed_kernel_binary_True_max_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_embed_kernel_binary_True_max_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_multi_arch_embed_kernel_binary_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_split_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_size_from_multi_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_size_with_unbacked_add_and_mul_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_size_with_unbacked_add_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_size_with_unbacked_add_expr_transitive_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_small_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_so_without_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_stft_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_stride_with_unbacked_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_subclasses_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sym_expr_indexing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sym_i64_input_codegen_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_symbool_item_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_symfloat_item_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_symint_item_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sympy_cpp_printer_min_max_minmax0_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sympy_cpp_printer_min_max_minmax1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_torchvision_transforms_functional_tensor_resize_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_autotuning_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_dynamic_launcher_grid_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_dynamic_launcher_grid_infer_from_tensor_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_bool_param_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_dynamic_grid_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_dynamic_shape_with_div_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_extern_kernel_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_multi_output_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_reinterpret_view_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_reinterpret_view_mem_leak_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_sympy_expr_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_sympy_fn_like_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_weird_param_order_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_mutated_autotuning_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_next_power_of_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_constant_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_constant_buffer_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_inactive_constant_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_user_managed_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_upper_bound_i64_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_using_model_name_for_files_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_view_outputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_weight_on_disk_legacy_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_nested_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_conv_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_conv_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_mixed_device_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_mixed_device_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_outer_buffers_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_outer_code_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_parameters_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_pytree_inputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_cudagraphs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_no_triton_profiler_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_offset_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_profiler_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_grid_with_backed_symbols_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_grid_with_unbacked_symbols_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_size_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_size_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__int_mm_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_add_complex_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_multiple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aliased_buffer_reuse_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_amp_fallback_random_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aot_inductor_consts_cpp_build_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_name_collision_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_cpp_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_fp8_dtype_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_sym_inputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_user_defined_triton_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printing_model_inputs_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_profiler_enable_kernel_profile_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_profiler_enable_kernel_profile_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_runtime_asserts_backed_symint_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_runtime_asserts_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_async_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_tensor_meta_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotune_int64_user_defined_triton_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotune_with_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotuning_args_reuse_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_backward_no_op_logging_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_bmm_multiple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_bool_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_boolean_indexing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_3_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_4_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_and_force_mmap_weights_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_reuse_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_clamp_decomposition_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_composed_dynamic_size_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_mismatched_branch_output_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_mismatched_branch_output_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_nested_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_non_tensor_predicates_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_non_tensor_predicates_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_share_predicte_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_simple_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_symint_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_unbacked_symint_closure_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_unbacked_symint_closure_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_use_buffers_from_outer_scope_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_with_multiple_outputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_with_outer_code_before_after_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_with_parameters_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_with_reinterpret_view_inputs_outputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_consecutive_compiles_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_folding_with_update_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_original_fqn_and_dtype_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_type_propagation_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv3d_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_convolution_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_custom_op_in_subgraph_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_d2h_copy_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dup_unbacked_sym_decl_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dup_unbacked_sym_decl_with_refinement_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_duplicate_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_duplicated_params_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_cat_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_scalar_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_smem_above_default_limit_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_embedding_bag_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_graph_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_extract_constants_map_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fake_tensor_device_validation_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fallback_kernel_with_symexpr_output_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fallback_mem_leak_fix_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fft_c2c_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fill__fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_foreach_multiple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fp8_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fp8_view_of_param_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fqn_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_free_inactive_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fx_gm_return_tuple_validation_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_index_put_fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_index_put_with_none_index_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_inf_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_input_codegen_with_sympy_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_int_list_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_issue_140766_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_dynamic_dim_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_grid_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_mmaped_weights_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_linear_dynamic_maxautotune_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_linear_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_load_package_multiple_gpus_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_masked_select_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misaligned_input_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misaligned_input_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misc_1_max_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misc_1_max_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_missing_cubin_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_missing_output_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_model_modified_weights_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_multi_device_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_multiple_output_alias_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_nan_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_narrow_fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_nested_tensor_from_jagged_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_no_args_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_contiguous_output_alias_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_default_gpu_device_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_tensor_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_none_args_aot_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_normal_functional_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_on_gpu_device1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_output_misaligned_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_output_path_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_output_path_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pad_fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pad_non_zero_memory_leak_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_poi_multiple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_profile_benchmark_harness_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_abs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_hann_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_permute_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_squeeze_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pytree_inputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_quanatized_int8_linear_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_quantized_linear_bias_none_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_quantized_linear_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeat_interleave_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeat_output_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_calling_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_replicate_on_devices_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_return_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_return_view_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_run_with_grad_enabled_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_complex_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_device_type_failed_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_dtype_failed_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_fp8_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_large_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_shape_failed_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_same_backing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scaled_dot_product_efficient_attention_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scatter_fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scatter_reduce_fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sdpa_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sdpa_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_seq_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_shifted_constraint_ranges_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_embed_kernel_binary_False_max_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_embed_kernel_binary_False_max_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_embed_kernel_binary_True_max_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_embed_kernel_binary_True_max_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_multi_arch_embed_kernel_binary_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_split_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_from_multi_output_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_small_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_so_without_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_stft_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_stride_with_unbacked_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_subclasses_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sym_expr_indexing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sym_i64_input_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_symbool_item_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_symfloat_item_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_symint_item_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax0_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_torchvision_transforms_functional_tensor_resize_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_autotuning_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_dynamic_launcher_grid_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_dynamic_launcher_grid_infer_from_tensor_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_bool_param_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_dynamic_grid_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_dynamic_shape_with_div_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_extern_kernel_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_multi_output_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_mem_leak_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_expr_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_fn_like_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_weird_param_order_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_with_none_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_mutated_autotuning_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_next_power_of_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_update_constant_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_update_constant_buffer_simple_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_update_inactive_constant_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_update_user_managed_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_upper_bound_i64_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_using_model_name_for_files_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_view_outputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_weight_on_disk_legacy_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_nested_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_simple_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_conv_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_conv_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_outer_buffers_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_outer_code_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_parameters_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_pytree_inputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_cudagraphs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_no_triton_profiler_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_offset_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_profiler_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_grid_with_backed_symbols_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_grid_with_unbacked_symbols_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__int_mm_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_add_complex_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_addmm_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_addmm_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aliased_buffer_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_amp_fallback_random_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aot_inductor_consts_cpp_build_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_constant_tensor_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_constant_tensor_name_collision_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_cpp_kernel_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_fp8_dtype_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_sym_inputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_user_defined_triton_kernel_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printing_model_inputs_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_profiler_enable_kernel_profile_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_profiler_enable_kernel_profile_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_runtime_asserts_backed_symint_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_runtime_asserts_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_assert_async_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_assert_tensor_meta_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_int64_user_defined_triton_kernel_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_with_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotuning_args_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_backward_no_op_logging_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_bmm_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_bool_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_boolean_indexing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_3_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_4_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_and_force_mmap_weights_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_clamp_decomposition_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_composed_dynamic_size_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_mismatched_branch_output_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_mismatched_branch_output_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_nested_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_non_tensor_predicates_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_non_tensor_predicates_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_share_predicte_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_simple_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_symint_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_unbacked_symint_closure_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_unbacked_symint_closure_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_use_buffers_from_outer_scope_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_multiple_outputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_outer_code_before_after_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_parameters_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_reinterpret_view_inputs_outputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_consecutive_compiles_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_folding_with_update_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_original_fqn_and_dtype_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_type_propagation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_conv3d_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_conv_freezing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_convolution_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_copy_non_blocking_is_pinned_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_custom_op_in_subgraph_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_d2h_copy_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_deconv_freezing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dup_unbacked_sym_decl_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dup_unbacked_sym_decl_with_refinement_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_duplicate_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_duplicated_params_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_cat_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_scalar_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_smem_above_default_limit_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_embedding_bag_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_cat_dtype_promotion_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_graph_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_extract_constants_map_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fake_tensor_device_validation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_kernel_with_symexpr_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_mem_leak_fix_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fft_c2c_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fill__fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_foreach_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_view_of_param_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fqn_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_free_inactive_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_freezing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fx_gm_return_tuple_validation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_index_put_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_index_put_with_none_index_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_inf_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_input_codegen_with_sympy_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_int_list_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_issue_140766_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_dynamic_dim_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_grid_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_mmaped_weights_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_weight_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_linear_dynamic_maxautotune_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_linear_freezing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_load_package_multiple_gpus_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_masked_select_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misaligned_input_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misaligned_input_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misc_1_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misc_1_max_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_missing_cubin_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_missing_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_model_modified_weights_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_multi_device_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_multiple_output_alias_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nan_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_narrow_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nested_tensor_from_jagged_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_no_args_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_contiguous_output_alias_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_default_gpu_device_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_tensor_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_none_args_aot_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_normal_functional_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_on_gpu_device1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_misaligned_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pad_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pad_non_zero_memory_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_poi_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_profile_benchmark_harness_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_proxy_executor_abs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_proxy_executor_hann_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_proxy_executor_permute_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_proxy_executor_squeeze_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pytree_inputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quanatized_int8_linear_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quantized_linear_bias_none_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quantized_linear_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeat_interleave_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeat_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_calling_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_replicate_on_devices_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_return_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_return_view_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_reuse_kernel_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_reuse_kernel_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_rocm_triton_autotuning_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_run_with_grad_enabled_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_complex_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_device_type_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_dtype_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_fp8_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_large_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_shape_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_same_backing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_scaled_dot_product_efficient_attention_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_scatter_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_scatter_reduce_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sdpa_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sdpa_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_seq_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_shifted_constraint_ranges_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_False_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_False_max_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_True_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_True_max_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_multi_arch_embed_kernel_binary_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_split_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_from_multi_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_and_mul_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_expr_transitive_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_small_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_so_without_weight_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_stft_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_stride_with_unbacked_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_subclasses_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_expr_indexing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_i64_input_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symbool_item_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symfloat_item_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symint_item_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sympy_cpp_printer_min_max_minmax0_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sympy_cpp_printer_min_max_minmax1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_torchvision_transforms_functional_tensor_resize_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_autotuning_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_dynamic_launcher_grid_infer_from_tensor_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_dynamic_launcher_grid_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_bool_param_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_dynamic_grid_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_dynamic_shape_with_div_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_equal_to_1_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_equal_to_1_float_arg_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_equal_to_1_float_arg_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_extern_kernel_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_multi_output_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_reinterpret_view_mem_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_reinterpret_view_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_expr_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_fn_like_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_weird_param_order_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_mutated_autotuning_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_next_power_of_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_constant_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_constant_buffer_simple_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_inactive_constant_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_user_managed_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_upper_bound_i64_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_using_model_name_for_files_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_view_outputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_weight_on_disk_legacy_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_nested_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_simple_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_conv_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_conv_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_outer_buffers_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_outer_code_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_parameters_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_pytree_inputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_cudagraphs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_no_triton_profiler_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_offset_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_profiler_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_backed_symbols_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_unbacked_symbols_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_size_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_size_weight_mps 2025-09-07T07:30:59.3820547Z 2025-09-07T07:30:59.3820972Z Running dynamo/test_dynamic_shapes 1/1 ... [2025-09-07 07:30:59.318802] 2025-09-07T07:30:59.3821498Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:30:59.3822644Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_dynamic_shapes.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:30:59.319132] 2025-09-07T07:31:00.2332906Z 2025-09-07T07:31:00.2334094Z export/test_tools 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tools_1.1_711aa0dbb28148ac_.log 2025-09-07T07:31:00.2335975Z Running 2 items in this shard: test/export/test_tools.py::TestExportTools::test_report_exportability_basic, test/export/test_tools.py::TestExportTools::test_report_exportability_with_issues 2025-09-07T07:31:00.2336998Z 2025-09-07T07:31:00.2337328Z Running inductor/test_aot_inductor_utils 1/1 ... [2025-09-07 07:31:00.233302] 2025-09-07T07:31:00.2338231Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:31:00.2339817Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_utils.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:31:00.233630] 2025-09-07T07:31:00.4609690Z 2025-09-07T07:31:00.4611035Z dynamo/test_subgraphs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_subgraphs_1.1_d99ea0c57cb841f4_.log 2025-09-07T07:31:00.4624577Z Running 44 items in this shard: test/dynamo/test_subgraphs.py::SubGraphTests::test_capi_call1, test/dynamo/test_subgraphs.py::SubGraphTests::test_capi_call2, test/dynamo/test_subgraphs.py::SubGraphTests::test_capi_call3, test/dynamo/test_subgraphs.py::SubGraphTests::test_control_flow1, test/dynamo/test_subgraphs.py::SubGraphTests::test_control_flow2, test/dynamo/test_subgraphs.py::SubGraphTests::test_control_flow3, test/dynamo/test_subgraphs.py::SubGraphTests::test_control_flow4, test/dynamo/test_subgraphs.py::SubGraphTests::test_control_flow5, test/dynamo/test_subgraphs.py::SubGraphTests::test_dynamic_duck_size, test/dynamo/test_subgraphs.py::SubGraphTests::test_dynamic_getitem, test/dynamo/test_subgraphs.py::SubGraphTests::test_dynamic_kwarg, test/dynamo/test_subgraphs.py::SubGraphTests::test_dynamic_order_dependence, test/dynamo/test_subgraphs.py::SubGraphTests::test_dynamic_zero_inference, test/dynamo/test_subgraphs.py::SubGraphTests::test_enumerate_not_break_graph, test/dynamo/test_subgraphs.py::SubGraphTests::test_extended_args, test/dynamo/test_subgraphs.py::SubGraphTests::test_graph_break_on_item, test/dynamo/test_subgraphs.py::SubGraphTests::test_indirect_unsupported1, test/dynamo/test_subgraphs.py::SubGraphTests::test_indirect_unsupported2, test/dynamo/test_subgraphs.py::SubGraphTests::test_indirect_unsupported3, test/dynamo/test_subgraphs.py::SubGraphTests::test_multigraph, test/dynamo/test_subgraphs.py::SubGraphTests::test_no_graph_break_on_item, test/dynamo/test_subgraphs.py::SubGraphTests::test_pop_after_resume, test/dynamo/test_subgraphs.py::SubGraphTests::test_restore_range, test/dynamo/test_subgraphs.py::SubGraphTests::test_restore_range_iter, test/dynamo/test_subgraphs.py::SubGraphTests::test_restore_state, test/dynamo/test_subgraphs.py::SubGraphTests::test_resume1, test/dynamo/test_subgraphs.py::SubGraphTests::test_resume2, test/dynamo/test_subgraphs.py::SubGraphTests::test_resume3, test/dynamo/test_subgraphs.py::SubGraphTests::test_resume4, test/dynamo/test_subgraphs.py::SubGraphTests::test_resume5, test/dynamo/test_subgraphs.py::SubGraphTests::test_resume_freevars, test/dynamo/test_subgraphs.py::SubGraphTests::test_resume_paths_join, test/dynamo/test_subgraphs.py::SubGraphTests::test_resume_tuple_iterator, test/dynamo/test_subgraphs.py::SubGraphTests::test_resume_with_no_grad1, test/dynamo/test_subgraphs.py::SubGraphTests::test_resume_with_no_grad2, test/dynamo/test_subgraphs.py::SubGraphTests::test_resume_with_no_grad3, test/dynamo/test_subgraphs.py::SubGraphTests::test_stack_state1, test/dynamo/test_subgraphs.py::SubGraphTests::test_stack_state2, test/dynamo/test_subgraphs.py::SubGraphTests::test_start1, test/dynamo/test_subgraphs.py::SubGraphTests::test_start2, test/dynamo/test_subgraphs.py::SubGraphTests::test_start3, test/dynamo/test_subgraphs.py::SubGraphTests::test_start4, test/dynamo/test_subgraphs.py::SubGraphTests::test_tuple_iterator_mutate, test/dynamo/test_subgraphs.py::SubGraphTests::test_tuple_iterator_return 2025-09-07T07:31:00.4635256Z 2025-09-07T07:31:00.4635541Z Running functorch/test_ops 1/3 ... [2025-09-07 07:31:00.461183] 2025-09-07T07:31:00.4635988Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:31:00.4637079Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '-m', 'not serial', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:31:00.461580] 2025-09-07T07:31:07.8590274Z 2025-09-07T07:31:07.8591897Z inductor/test_aot_inductor_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_utils_1.1_0733991d80226649_.log 2025-09-07T07:31:07.8593132Z Running 0 items in this shard: 2025-09-07T07:31:07.8593409Z 2025-09-07T07:31:07.8593779Z Running functorch/test_ops 2/3 ... [2025-09-07 07:31:07.859013] 2025-09-07T07:31:07.8594511Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:31:07.8595929Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '-m', 'not serial', '--shard-id=2', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:31:07.859328] 2025-09-07T07:31:18.4539035Z 2025-09-07T07:31:18.4540284Z functorch/test_ops 1/3 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_1.3_2ce17a83c4db6117_.log 2025-09-07T07:31:18.5584909Z Running 3388 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_layer_norm_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_nll_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_softmax_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_argmax_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_argmin_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_ceil_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_floor_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_topk_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amin_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmax_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ceil_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_gt_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_le_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_maximum_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_minimum_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_T_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_broadcast_to_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_broadcast_to_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_conj_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_expand_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_flatten_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_hsplit_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_unbind_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_narrow_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_narrow_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_positive_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_resolve_conj_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_select_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_multiple_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unfold_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unfold_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unsqueeze_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_as_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SortGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ZeroGradientsGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___radd___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rdiv___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__segment_reduce_lengths_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_put_accumulate_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_acosh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcdiv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcmul_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmm_decomposed_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_all_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_angle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argwhere_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atanh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cartesian_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_column_stack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumulative_trapezoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_embed_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_floor_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_double_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dsplit_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_permuted_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exponential_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_power_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geqrf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gradient_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hash_tensor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_heaviside_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hypot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igamma_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_fill_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_inner_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isnan_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isneginf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_4inputs_with_extra_args_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_return_by_ref_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cross_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_diagonal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_householder_product_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_norm_subgradients_at_zero_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_singular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_qr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_tensorinv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_with_dtype_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logcumsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_or_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_xor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mH_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_cumprod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_var_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matrix_exp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_variadic_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_movedim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nan_to_num_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanquantile_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_batch_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_layer_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_groups_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_elu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_bag_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_glu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_group_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardtanh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hinge_embedding_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_instance_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_area_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bicubic_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_linear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_nearest-exact_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_trilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_margin_ranking_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_grad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_grad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_head_attention_forward_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_nll_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_circular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_reflect_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_replicate_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pairwise_distance_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_relu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rms_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_scaled_dot_product_attention_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_silu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_with_dtype_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softplus_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_nearest_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_inf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten_index_put_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pinverse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polar_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_4_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_positive_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_put_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_qr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_quantile_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rad2deg_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ravel_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_real_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_renorm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize_as__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resolve_neg_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_roll_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_select_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_exponential_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_cosine_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hann_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_kaiser_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_nuttall_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_softmax_with_dtype_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_mm_reduce_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_airy_ai_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_v_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_h_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_he_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i0e_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1e_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_legendre_polynomial_p_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_log_ndtr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_k0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_k1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_spherical_bessel_j0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_xlog1py_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_square_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_along_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tan_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensor_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_topk_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trapezoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_true_divide_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zero__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_SortGenVmapAutogradFunction_cuda_float32 2025-09-07T07:31:18.6599592Z 2025-09-07T07:31:18.6599873Z Running inductor/test_cpu_select_algorithm 1/1 ... [2025-09-07 07:31:18.458287] 2025-09-07T07:31:18.6600416Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:31:18.6601569Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpu_select_algorithm.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:31:18.458628] 2025-09-07T07:31:25.5832871Z 2025-09-07T07:31:25.5834378Z inductor/test_cpu_select_algorithm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_select_algorithm_1.1_d9788db22c9bfb36_.log 2025-09-07T07:31:25.5835681Z Running 0 items in this shard: 2025-09-07T07:31:25.5835975Z 2025-09-07T07:31:25.5836293Z Running xpu/test_gemm 1/1 ... [2025-09-07 07:31:25.583293] 2025-09-07T07:31:25.5836854Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:31:25.5838454Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'xpu/test_gemm.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:31:25.583638] 2025-09-07T07:31:29.3541433Z 2025-09-07T07:31:29.3542776Z xpu/test_gemm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/xpu.test_gemm_1.1_6160d023c2d29264_.log 2025-09-07T07:31:29.3544451Z Running 0 items in this shard: 2025-09-07T07:31:29.3544817Z 2025-09-07T07:31:29.3545196Z Running higher_order_ops/test_invoke_quant 1/1 ... [2025-09-07 07:31:29.354283] 2025-09-07T07:31:29.3546054Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:31:29.3549037Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'higher_order_ops/test_invoke_quant.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:31:29.354684] 2025-09-07T07:31:36.5797099Z 2025-09-07T07:31:36.5798960Z higher_order_ops/test_invoke_quant 1/1 was successful, full logs can be found in artifacts with path test/test-reports/higher_order_ops.test_invoke_quant_1.1_5148575b5fb5a8ba_.log 2025-09-07T07:31:36.5805770Z Running 14 items in this shard: test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantEager::test_construct_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantEager::test_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantEager::test_multiple, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantEager::test_simple, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantAotEager::test_construct_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantAotEager::test_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantAotEager::test_multiple, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantAotEager::test_simple, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_construct_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_multiple, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_pattern_matching, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_prologue, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_simple 2025-09-07T07:31:36.5811757Z 2025-09-07T07:31:36.5812077Z Running inductor/test_online_softmax 1/1 ... [2025-09-07 07:31:36.579814] 2025-09-07T07:31:36.5812583Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:31:36.5813784Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_online_softmax.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:31:36.580266] 2025-09-07T07:31:43.8555556Z 2025-09-07T07:31:43.8556863Z inductor/test_online_softmax 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_online_softmax_1.1_650e8dbc98131ea7_.log 2025-09-07T07:31:43.8569226Z Running 30 items in this shard: test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_causal_mask, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_codegen_3pass_softmax_due_to_disable, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_codegen_online_softmax_V_2048_use_log_softmax_False, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_codegen_online_softmax_V_2048_use_log_softmax_True, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_codegen_online_softmax_V_50304_use_log_softmax_False, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_codegen_online_softmax_V_50304_use_log_softmax_True, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_codegen_softmax_persistent_reduction, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_log_softmax, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_no_online_softmax_for_cpu, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_prepare_softmax_acc_with_fp64_bfloat16, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_prepare_softmax_acc_with_fp64_float16, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_prepare_softmax_acc_with_fp64_float32, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_prepare_softmax_nrow_2048_dim_-1, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_prepare_softmax_nrow_2048_dim_0, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_prepare_softmax_nrow_2048_dim_1, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_prepare_softmax_nrow_2_dim_-1, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_prepare_softmax_nrow_2_dim_0, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_prepare_softmax_nrow_2_dim_1, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_prepare_softmax_perf, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_sdpa, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_softmax, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_softmax_acc_with_fp64_fn0_bfloat16, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_softmax_acc_with_fp64_fn0_float16, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_softmax_acc_with_fp64_fn0_float32, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_softmax_acc_with_fp64_fn1_bfloat16, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_softmax_acc_with_fp64_fn1_float16, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_softmax_acc_with_fp64_fn1_float32, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_softmin, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_split_reduction, test/inductor/test_online_softmax.py::TestOnlineSoftmax::test_tb_speech_transformer_attn 2025-09-07T07:31:43.8579202Z 2025-09-07T07:31:43.8579480Z Running inductor/test_split_cat_fx_passes 1/1 ... [2025-09-07 07:31:43.855617] 2025-09-07T07:31:43.8580018Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:31:43.8581216Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_split_cat_fx_passes.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:31:43.856025] 2025-09-07T07:31:51.1817962Z 2025-09-07T07:31:51.1819478Z inductor/test_split_cat_fx_passes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_split_cat_fx_passes_1.1_1367d3d3fc986c11_.log 2025-09-07T07:31:51.1825700Z Running 11 items in this shard: test/inductor/test_split_cat_fx_passes.py::TestSplitCatFxPasses::test_cat_normalization, test/inductor/test_split_cat_fx_passes.py::TestSplitCatFxPasses::test_config_flag_is_respected, test/inductor/test_split_cat_fx_passes.py::TestSplitCatFxPasses::test_consecutive_split_merge, test/inductor/test_split_cat_fx_passes.py::TestSplitCatFxPasses::test_numpy_compat_normalization, test/inductor/test_split_cat_fx_passes.py::TestSplitCatFxPasses::test_split_cat_merge, test/inductor/test_split_cat_fx_passes.py::TestSplitCatFxPasses::test_split_cat_merge_mutation, test/inductor/test_split_cat_fx_passes.py::TestSplitCatFxPasses::test_split_cat_new_patterns, test/inductor/test_split_cat_fx_passes.py::TestSplitCatFxPasses::test_split_normalization, test/inductor/test_split_cat_fx_passes.py::TestSplitCatFxPasses::test_split_squeeze, test/inductor/test_split_cat_fx_passes.py::TestSplitCatFxPasses::test_stack_normalization_axis_kwarg, test/inductor/test_split_cat_fx_passes.py::TestSplitCatFxPasses::test_unbind_stack 2025-09-07T07:31:51.1831383Z 2025-09-07T07:31:51.1831707Z Running test_cuda_expandable_segments 1/1 ... [2025-09-07 07:31:51.181913] 2025-09-07T07:31:51.1832326Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:31:51.1833660Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_expandable_segments.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:31:51.182259] 2025-09-07T07:31:56.1541811Z 2025-09-07T07:31:56.1542960Z test_cuda_expandable_segments 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_expandable_segments_1.1_2d25dd68ce78fb65_.log 2025-09-07T07:31:56.1543686Z 2025-09-07T07:31:56.1544167Z Running test_type_hints 1/1 ... [2025-09-07 07:31:56.154248] 2025-09-07T07:31:56.1544865Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:31:56.1547883Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_hints.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:31:56.154591] 2025-09-07T07:31:59.7746482Z 2025-09-07T07:31:59.7747539Z test_type_hints 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_hints_1.1_29d1731192676e65_.log 2025-09-07T07:31:59.7748717Z Running 1 items in this shard: test/test_type_hints.py::TestTypeHints::test_doc_examples 2025-09-07T07:31:59.7749468Z 2025-09-07T07:31:59.7749821Z Running dynamo/test_unittest 1/1 ... [2025-09-07 07:31:59.774713] 2025-09-07T07:31:59.7750465Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:31:59.7752623Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_unittest.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:31:59.775057] 2025-09-07T07:32:02.4402319Z 2025-09-07T07:32:02.4404553Z inductor/test_torchinductor_opinfo 1/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_1.12_b8a1a07c246d38e4_.log 2025-09-07T07:32:02.4521494Z Running 292 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__segment_reduce_lengths_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addbmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmv_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_xor_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_complex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dist_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_no_rounding_mode_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fliplr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frac_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hypot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igammac_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_item_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_item_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lcm_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_svd_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorinv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logaddexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logaddexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matmul_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_dropout_backward_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_celu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_elu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bicubic_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_prelu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softplus_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_qr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsub_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_kaiser_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_lowrank_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triu_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trunc_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_uniform_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_uint32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_complex_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_uint8 2025-09-07T07:32:02.4632399Z 2025-09-07T07:32:02.4632761Z Running dynamo/test_guard_serialization 1/1 ... [2025-09-07 07:32:02.440653] 2025-09-07T07:32:02.4633327Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:02.4634390Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_guard_serialization.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:02.440976] 2025-09-07T07:32:03.5452598Z 2025-09-07T07:32:03.5453809Z dynamo/test_unittest 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_unittest_1.1_cac68f900947b347_.log 2025-09-07T07:32:03.5455478Z Running 1 items in this shard: test/dynamo/test_unittest.py::TestUnittest::test_SkipTest 2025-09-07T07:32:03.5456140Z 2025-09-07T07:32:03.5456764Z Running functorch/test_minifier 1/1 ... [2025-09-07 07:32:03.545322] 2025-09-07T07:32:03.5457415Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:03.5459312Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_minifier.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:03.545726] 2025-09-07T07:32:07.6163264Z 2025-09-07T07:32:07.6164900Z functorch/test_minifier 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_minifier_1.1_707eefab3a0710e1_.log 2025-09-07T07:32:07.6168817Z Running 5 items in this shard: test/functorch/test_minifier.py::TestMinifier::test_has_add_mul, test/functorch/test_minifier.py::TestMinifier::test_has_mul_minifier, test/functorch/test_minifier.py::TestMinifier::test_input_returned, test/functorch/test_minifier.py::TestMinifier::test_module, test/functorch/test_minifier.py::TestMinifier::test_tup_use 2025-09-07T07:32:07.6170325Z 2025-09-07T07:32:07.6170639Z Running test_legacy_vmap 1/1 ... [2025-09-07 07:32:07.616480] 2025-09-07T07:32:07.6171143Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:07.6172383Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_legacy_vmap.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:07.616975] 2025-09-07T07:32:08.7522896Z 2025-09-07T07:32:08.7524290Z inductor/test_torchinductor_opinfo 8/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_8.12_99744c517000d67f_.log 2025-09-07T07:32:08.7631907Z Running 270 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__softmax_backward_data_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcdiv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_alias_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_allclose_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_not_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bool_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_shapes_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_complex_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eq_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfc_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfc_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_half_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorsolve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vector_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_tensor_overload_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_log_softmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logaddexp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mv_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_layer_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mish_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mish_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softplus_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_fro_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pinverse_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_qr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_bartlett_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_uniform_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_unbiased_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_int32 2025-09-07T07:32:08.7734726Z 2025-09-07T07:32:08.7735123Z Running dynamo/test_cudagraphs_expandable_segments 1/1 ... [2025-09-07 07:32:08.752689] 2025-09-07T07:32:08.7735851Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:08.7736969Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_cudagraphs_expandable_segments.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:08.752983] 2025-09-07T07:32:09.3650182Z 2025-09-07T07:32:09.3651897Z dynamo/test_guard_serialization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_guard_serialization_1.1_ae06d2b5cf698b01_.log 2025-09-07T07:32:09.3679730Z Running 38 items in this shard: test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bool_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_builtin_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_closure_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_constant_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_default_device, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_deterministic_algorithms, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_contains, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_keys_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_version, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dispatch_key_set_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dual_level, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_duplicate_input, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_empty_nn_module_hooks_dict, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_equals_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_fsdp_training_state, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_function_locals, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_function_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_functorch_stack_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_grad_mode, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_grad_mode_loading, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_hasattr_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_id_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_id_match_with_config, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_mapping_keys_check, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_name_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_nn_module, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_none_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_not_present_in_generic_dict, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_range_iterator_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_sequence_length, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_shape_env, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_skipped_objects, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tensor_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tensor_subclass_metadata_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_torch_function_state, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tuple_iterator_len, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_type_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_weakref_alive 2025-09-07T07:32:09.3691738Z 2025-09-07T07:32:09.3692135Z Running torch_np/numpy_tests/core/test_einsum 1/1 ... [2025-09-07 07:32:09.365144] 2025-09-07T07:32:09.3692876Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:09.3694036Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_einsum.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:09.365491] 2025-09-07T07:32:12.6732008Z 2025-09-07T07:32:12.6733822Z dynamo/test_cudagraphs_expandable_segments 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_cudagraphs_expandable_segments_1.1_7b0b77142ede3d3f_.log 2025-09-07T07:32:12.6742167Z Running 8 items in this shard: test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_basic, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_dead_fill, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_dtoh, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_factory, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_htod, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_mutate_constant, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_mutate_input, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_mutated_metadata 2025-09-07T07:32:12.6746032Z 2025-09-07T07:32:12.6746432Z Running inductor/test_benchmarking 1/1 ... [2025-09-07 07:32:12.673178] 2025-09-07T07:32:12.6747310Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:12.6748808Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmarking.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:12.673544] 2025-09-07T07:32:13.1853741Z 2025-09-07T07:32:13.1855415Z torch_np/numpy_tests/core/test_einsum 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_einsum_1.1_23279c95227dcdc1_.log 2025-09-07T07:32:13.1879080Z Running 50 items in this shard: test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_broadcasting_dot_cases, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_collapse, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_combined_views_mapping, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_complex, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_different_paths_dtype_B, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_different_paths_dtype_D, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_different_paths_dtype_F, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_different_paths_dtype_b, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_different_paths_dtype_d, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_different_paths_dtype_e, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_different_paths_dtype_f, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_different_paths_dtype_h, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_different_paths_dtype_i, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_different_paths_dtype_l, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_edge_cases, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_all_contig_non_contig_output, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_broadcast, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_errors, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_failed_on_p9_and_s390x, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_fixed_collapsingbug, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_fixedstridebug, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_misc, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_sums_cfloat128, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_sums_cfloat64, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_sums_float16, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_sums_float32, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_sums_float64, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_sums_int16, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_sums_int32, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_sums_int64, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_sums_int8, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_sums_uint8, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_einsum_views, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_expand, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_hadamard_like_products, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_index_transformations, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_inner_product, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_out_is_res, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_output_order, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_random_cases, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_small_boolean_arrays, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsum::test_subscript_range, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsumPath::test_edge_paths, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsumPath::test_long_paths, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsumPath::test_memory_contraints, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsumPath::test_path_type_input, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsumPath::test_path_type_input_internal_trace, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsumPath::test_path_type_input_invalid, test/torch_np/numpy_tests/core/test_einsum.py::TestEinsumPath::test_spaces, test/torch_np/numpy_tests/core/test_einsum.py::TestMisc::test_overlap 2025-09-07T07:32:13.1894291Z 2025-09-07T07:32:13.1894691Z Running dynamo/test_model_output 1/1 ... [2025-09-07 07:32:13.185580] 2025-09-07T07:32:13.1895186Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:13.1896204Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_model_output.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:13.185923] 2025-09-07T07:32:13.6401753Z 2025-09-07T07:32:13.6403101Z test_legacy_vmap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_legacy_vmap_1.1_f38afc4064ee155a_.log 2025-09-07T07:32:13.6438723Z Running 124 items in this shard: test/test_legacy_vmap.py::TestVmapAPILegacy::test_accepts_nested_inputs, test/test_legacy_vmap.py::TestVmapAPILegacy::test_backward_unsupported_interaction, test/test_legacy_vmap.py::TestVmapAPILegacy::test_batched_gradient_basic, test/test_legacy_vmap.py::TestVmapAPILegacy::test_constant_function, test/test_legacy_vmap.py::TestVmapAPILegacy::test_different_map_dim_size_raises, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_atan2, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_does_not_warn_by_default, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_masked_fill, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_multiple_returns, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_warns_when_warnings_are_enabled, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_with_undefined_grad, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_zero_dim, test/test_legacy_vmap.py::TestVmapAPILegacy::test_func_with_no_inputs, test/test_legacy_vmap.py::TestVmapAPILegacy::test_functools_partial, test/test_legacy_vmap.py::TestVmapAPILegacy::test_grad_unsupported_interaction, test/test_legacy_vmap.py::TestVmapAPILegacy::test_in_dim_not_in_tensor_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_in_dims_wrong_type_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_inplace_fallback_nary_different_levels, test/test_legacy_vmap.py::TestVmapAPILegacy::test_inplace_fallback_nary_same_levels, test/test_legacy_vmap.py::TestVmapAPILegacy::test_inplace_fallback_unary, test/test_legacy_vmap.py::TestVmapAPILegacy::test_integer_in_dim_but_not_tensor_input_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_multiple_inputs, test/test_legacy_vmap.py::TestVmapAPILegacy::test_multiple_out_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_multiple_outputs, test/test_legacy_vmap.py::TestVmapAPILegacy::test_multiple_outputs_error_cases, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nested_non_default_in_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nested_out_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nested_with_different_map_dim, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nested_with_same_map_dim, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nn_module, test/test_legacy_vmap.py::TestVmapAPILegacy::test_non_default_in_dims_out_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_non_tensor_output_raises, test/test_legacy_vmap.py::TestVmapAPILegacy::test_non_zero_in_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_none_in_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nonzero_out_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_noop_in_inner_vmap, test/test_legacy_vmap.py::TestVmapAPILegacy::test_not_enough_in_dims_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_out_dim_out_of_bounds_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_out_dims_and_num_outputs_mismatch_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_out_dims_edge_case, test/test_legacy_vmap.py::TestVmapAPILegacy::test_out_dims_must_be_int_or_tuple_of_int_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_single_input, test/test_legacy_vmap.py::TestVmapAPILegacy::test_unsupported_op_err_msg, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_T_numpy, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_as_strided, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_binary_pointwise_ops, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_bmm, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_cat, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_chunk, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_clamp, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_clone, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_comparison_ops, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_conj, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_contiguous, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_diagonal, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_dot, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_expand_as, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_fill_and_zero_inplace, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_imag, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_is_complex, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_is_contiguous, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_is_floating_point, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_mm, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_movedim, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_mv, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_narrow, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_new_empty, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_new_empty_strided, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_new_zeros, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_no_random_op_support, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_real, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_reshape, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_reshape_as, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_result_type, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_select, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_slice, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_split, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_squeeze, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_stack, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_stride, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_sum_dim, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_t, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_tensor_split, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_to, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_trace, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_transpose, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_unary_pointwise_ops, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_unbind, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_unfold, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_view, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_view_as, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_view_as_complex, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_view_as_real, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_vmap_fallback_check, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_vmap_fallback_check_ok, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_add_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_binary_cross_entropy_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_diagonal_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_div_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_expand_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_index_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_inplace_manyview_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_inplace_on_view_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_lgamma_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_log1p_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_log_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_logsumexp_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_max_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_median_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_min_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_mul_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_permute_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_reshape_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_select_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_sigmoid_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_slice_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_stack_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_sub_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_threshold_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_trace_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_unrelated_output_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_unrelated_output_multiple_grad_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_vmap_fallback_check, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_vmap_fallback_check_ok 2025-09-07T07:32:13.6468588Z 2025-09-07T07:32:13.6477671Z Running torch_np/test_basic 1/1 ... [2025-09-07 07:32:13.640521] 2025-09-07T07:32:13.6478117Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:13.6479052Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_basic.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:13.640883] 2025-09-07T07:32:13.9659996Z 2025-09-07T07:32:13.9661116Z inductor/test_torchinductor_opinfo 12/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_12.12_a77273b97f0b8e5c_.log 2025-09-07T07:32:13.9784346Z Running 311 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmatmul___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__segment_reduce_offsets_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__softmax_backward_data_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addbmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_decomposed_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bincount_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bool_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_complex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_no_rounding_mode_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_equal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfinv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fliplr_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fliplr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frac_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hypot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igamma_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_det_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_power_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_slogdet_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_triangular_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorinv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logaddexp2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_multinomial_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_dropout_backward_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_glu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multi_head_attention_forward_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multi_margin_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multi_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_static_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_static_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_inf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pinverse_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rand_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rand_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_neg_3_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hamming_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_with_dtype_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_bool 2025-09-07T07:32:13.9900479Z 2025-09-07T07:32:13.9900670Z Running test_segment_reductions 1/1 ... [2025-09-07 07:32:13.966667] 2025-09-07T07:32:13.9901043Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:13.9901960Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_segment_reductions.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:13.967071] 2025-09-07T07:32:17.4564971Z 2025-09-07T07:32:17.4566155Z dynamo/test_model_output 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_model_output_1.1_a1b54f09647e0e52_.log 2025-09-07T07:32:17.4572667Z Running 18 items in this shard: test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained, test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained_non_const_attr, test/dynamo/test_model_output.py::TestModelOutput::test_mo_assign, test/dynamo/test_model_output.py::TestModelOutput::test_mo_create, test/dynamo/test_model_output.py::TestModelOutput::test_mo_from_outside, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getattr, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getattr_missing, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getitem, test/dynamo/test_model_output.py::TestModelOutput::test_mo_index, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init2, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init_with_disable, test/dynamo/test_model_output.py::TestModelOutput::test_mo_newkey, test/dynamo/test_model_output.py::TestModelOutput::test_mo_reconstruct_bytecode, test/dynamo/test_model_output.py::TestModelOutput::test_mo_tuple, test/dynamo/test_model_output.py::TestModelOutput::test_none, test/dynamo/test_model_output.py::TestModelOutput::test_reconstruction, test/dynamo/test_model_output.py::TestModelOutputBertCUDA::test_HF_bert_model_output_cuda 2025-09-07T07:32:17.4578817Z 2025-09-07T07:32:17.4579034Z Running test_ops_fwd_gradients 1/1 ... [2025-09-07 07:32:17.456614] 2025-09-07T07:32:17.4579467Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:17.4580572Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_fwd_gradients.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:17.456947] 2025-09-07T07:32:18.1619626Z 2025-09-07T07:32:18.1620405Z torch_np/test_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_basic_1.1_e7b4d53e7f6fd1aa_.log 2025-09-07T07:32:18.1727200Z Running 453 items in this shard: test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func0, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func1, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func10, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func11, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func12, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func13, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func14, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func15, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func16, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func17, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func18, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func19, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func2, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func20, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func21, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func22, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func23, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func24, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func25, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func26, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func27, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func28, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func29, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func3, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func30, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func31, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func32, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func33, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func34, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func35, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func36, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func37, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func38, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func39, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func4, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func40, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func41, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func42, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func43, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func44, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func45, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func46, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func47, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func48, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func49, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func5, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func50, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func51, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func52, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func53, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func54, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func55, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func56, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func57, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func58, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func59, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func6, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func60, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func61, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func62, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func63, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func64, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func65, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func66, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func67, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func68, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func69, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func7, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func70, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func71, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func72, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func73, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func74, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func8, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func9, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func0, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func1, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func10, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func11, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func12, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func13, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func14, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func15, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func16, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func17, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func18, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func19, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func2, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func20, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func21, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func22, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func23, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func24, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func25, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func26, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func27, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func28, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func29, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func3, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func30, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func31, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func32, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func33, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func34, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func35, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func36, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func37, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func38, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func39, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func4, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func40, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func41, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func42, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func43, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func44, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func45, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func46, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func47, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func48, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func49, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func5, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func50, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func51, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func52, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func53, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func54, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func55, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func56, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func57, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func58, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func59, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func6, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func60, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func61, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func62, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func63, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func64, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func65, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func66, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func67, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func68, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func69, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func7, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func70, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func71, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func72, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func73, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func74, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func8, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func9, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func0, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func1, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func10, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func11, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func12, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func13, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func14, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func15, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func16, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func17, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func18, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func19, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func2, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func20, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func21, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func22, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func23, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func24, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func25, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func26, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func27, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func28, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func29, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func3, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func30, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func31, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func32, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func33, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func34, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func35, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func36, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func37, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func38, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func39, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func4, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func40, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func41, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func42, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func43, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func44, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func45, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func46, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func47, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func48, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func49, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func5, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func50, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func51, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func52, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func53, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func54, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func55, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func56, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func57, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func58, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func59, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func6, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func60, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func61, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func62, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func63, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func64, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func65, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func66, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func67, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func68, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func69, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func7, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func70, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func71, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func72, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func73, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func74, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func8, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func9, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func0_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func0_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func0_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func0_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func10_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func10_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func10_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func10_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func1_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func1_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func1_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func1_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func2_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func2_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func2_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func2_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func3_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func3_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func3_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func3_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func4_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func4_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func4_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func4_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func5_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func5_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func5_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func5_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func6_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func6_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func6_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func6_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func7_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func7_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func7_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func7_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func8_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func8_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func8_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func8_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func9_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func9_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func9_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func9_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func0_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func0_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func0_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func0_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func10_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func10_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func10_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func10_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func1_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func1_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func1_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func1_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func2_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func2_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func2_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func2_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func3_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func3_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func3_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func3_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func4_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func4_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func4_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func4_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func5_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func5_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func5_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func5_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func6_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func6_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func6_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func6_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func7_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func7_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func7_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func7_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func8_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func8_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func8_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func8_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func9_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func9_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func9_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func9_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func0_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func0_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func0_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func0_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func10_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func10_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func10_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func10_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func1_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func1_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func1_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func1_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func2_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func2_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func2_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func2_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func3_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func3_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func3_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func3_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func4_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func4_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func4_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func4_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func5_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func5_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func5_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func5_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func6_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func6_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func6_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func6_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func7_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func7_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func7_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func7_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func8_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func8_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func8_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func8_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func9_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func9_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func9_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func9_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_array_func0_axes0, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_array_func0_axes1, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_array_func0_axes2, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_list_func0_axes0, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_list_func0_axes1, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_list_func0_axes2, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_tensor_func0_axes0, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_tensor_func0_axes1, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_tensor_func0_axes2, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_array_func0, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_array_func1, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_array_func2, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_array_func3, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_array_func4, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_list_func0, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_list_func1, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_list_func2, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_list_func3, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_list_func4, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_tensor_func0, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_tensor_func1, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_tensor_func2, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_tensor_func3, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_tensor_func4, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_array_func0_np_func0, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_array_func1_np_func1, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_array_func2_np_func2, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_list_func0_np_func0, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_list_func1_np_func1, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_list_func2_np_func2, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_tensor_func0_np_func0, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_tensor_func1_np_func1, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_tensor_func2_np_func2, test/torch_np/test_basic.py::TestShapeLikeToArray::test_shape_func0, test/torch_np/test_basic.py::TestShapeLikeToArray::test_shape_func1, test/torch_np/test_basic.py::TestShapeLikeToArray::test_shape_func2, test/torch_np/test_basic.py::TestShapeLikeToArray::test_shape_func3, test/torch_np/test_basic.py::TestSequenceOfArrays::test_several_func0, test/torch_np/test_basic.py::TestSequenceOfArrays::test_several_func1, test/torch_np/test_basic.py::TestSequenceOfArrays::test_several_func2, test/torch_np/test_basic.py::TestSequenceOfArrays::test_several_func3, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_array_func0, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_array_func1, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_array_func2, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_array_func3, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_list_func0, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_list_func1, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_list_func2, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_list_func3, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_tensor_func0, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_tensor_func1, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_tensor_func2, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_tensor_func3, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func0, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func1, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func2, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func3, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func4, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func5, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func6, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_array_func0, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_array_func1, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_list_func0, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_list_func1, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_tensor_func0, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_tensor_func1, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func0_args0, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func1_args1, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func2_args2, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func3_args3, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func4_args4, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func5_args5, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func6_args6, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func7_args7, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func8_args8, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func9_args9, test/torch_np/test_basic.py::TestNormalizations::test_too_few_args_positional, test/torch_np/test_basic.py::TestNormalizations::test_unknown_args, test/torch_np/test_basic.py::TestNormalizations::test_unknown_args_with_defaults, test/torch_np/test_basic.py::TestCopyTo::test_copyto_basic, test/torch_np/test_basic.py::TestCopyTo::test_copyto_typecast, test/torch_np/test_basic.py::TestCopyTo::test_copytobcast, test/torch_np/test_basic.py::TestDivmod::test_divmod_no_out, test/torch_np/test_basic.py::TestDivmod::test_divmod_out, test/torch_np/test_basic.py::TestDivmod::test_divmod_out_both_pos_and_kw, test/torch_np/test_basic.py::TestDivmod::test_divmod_out_list, test/torch_np/test_basic.py::TestDivmod::test_divmod_pos_only, test/torch_np/test_basic.py::TestSmokeNotImpl::test_nimpl_basic, test/torch_np/test_basic.py::TestDefaultDtype::test_defaultdtype_defaults, test/torch_np/test_basic.py::TestDefaultDtype::test_set_default_float_dt_float32, test/torch_np/test_basic.py::TestDefaultDtype::test_set_default_float_dt_pytorch, test/torch_np/test_basic.py::TestDefaultDtype::test_set_default_float_float32, test/torch_np/test_basic.py::TestExport::test_exported_objects, test/torch_np/test_basic.py::TestCtorNested::test_arrays_in_lists, test/torch_np/test_basic.py::TestMisc::test_f16_on_cuda, test/torch_np/test_basic.py::TestMisc::test_ndarrays_to_tensors 2025-09-07T07:32:18.1827606Z 2025-09-07T07:32:18.1827789Z Running inductor/test_compile 1/1 ... [2025-09-07 07:32:18.162714] 2025-09-07T07:32:18.1828155Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:18.1829150Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:18.163061] 2025-09-07T07:32:18.2376852Z 2025-09-07T07:32:18.2377687Z test_segment_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_segment_reductions_1.1_c9e9d139e94534a0_.log 2025-09-07T07:32:18.2411601Z Running 74 items in this shard: test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_unsafe_flag_cuda_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_unsafe_flag_cuda_int64 2025-09-07T07:32:18.2438046Z 2025-09-07T07:32:18.2438204Z Running test_pruning_op 1/1 ... [2025-09-07 07:32:18.238075] 2025-09-07T07:32:18.2438531Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:18.2439485Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_pruning_op.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:18.238437] 2025-09-07T07:32:19.5020372Z 2025-09-07T07:32:19.5021845Z inductor/test_benchmarking 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmarking_1.1_40f3b1e2df50675b_.log 2025-09-07T07:32:19.5028881Z Running 12 items in this shard: test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_cpu_smoke_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_cpu_smoke_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_gpu_smoke_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_gpu_smoke_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_many_devices_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_many_devices_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_no_devices_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_no_devices_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls0_device_cpu, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls0_device_cuda, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls1_device_cpu, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls1_device_cuda 2025-09-07T07:32:19.5034819Z 2025-09-07T07:32:19.5035017Z Running inductor/test_multi_kernel 1/1 ... [2025-09-07 07:32:19.502183] 2025-09-07T07:32:19.5035393Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:19.5036307Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_multi_kernel.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:19.502552] 2025-09-07T07:32:22.0586261Z 2025-09-07T07:32:22.0587332Z test_pruning_op 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_pruning_op_1.1_e3e72341023b08c6_.log 2025-09-07T07:32:22.0588778Z Running 2 items in this shard: test/test_pruning_op.py::PruningOpTest::test_rowwise_prune_op_32bit_indices, test/test_pruning_op.py::PruningOpTest::test_rowwise_prune_op_64bit_indices 2025-09-07T07:32:22.0589603Z 2025-09-07T07:32:22.0589901Z Running inductor/test_decompose_mem_bound_mm 1/1 ... [2025-09-07 07:32:22.058706] 2025-09-07T07:32:22.0590426Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:22.0592916Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_decompose_mem_bound_mm.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:22.059114] 2025-09-07T07:32:24.9372953Z 2025-09-07T07:32:24.9374854Z inductor/test_compile 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_1.1_2cfa74f01341679b_.log 2025-09-07T07:32:24.9380055Z Running 9 items in this shard: test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_generate_debug_symbol, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_bare_module, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_export1, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_export2, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_fx, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_fx_dict_input, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_fx_tensor_return, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_make_fx, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_op_with_multiple_outputs 2025-09-07T07:32:24.9384188Z 2025-09-07T07:32:24.9384476Z Running inductor/test_block_analysis 1/1 ... [2025-09-07 07:32:24.937501] 2025-09-07T07:32:24.9385008Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:24.9386263Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_block_analysis.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:24.937902] 2025-09-07T07:32:26.7268261Z 2025-09-07T07:32:26.7269405Z inductor/test_multi_kernel 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_multi_kernel_1.1_2b309d83574518f3_.log 2025-09-07T07:32:26.7277420Z Running 19 items in this shard: test/inductor/test_multi_kernel.py::MultiKernelTest::test_batchnorm_training, test/inductor/test_multi_kernel.py::MultiKernelTest::test_inplace_update, test/inductor/test_multi_kernel.py::MultiKernelTest::test_layernorm, test/inductor/test_multi_kernel.py::MultiKernelTest::test_pass_same_arg_multi_times, test/inductor/test_multi_kernel.py::MultiKernelTest::test_reduction_scratch_buffer, test/inductor/test_multi_kernel.py::MultiKernelTest::test_reduction_scratch_buffer_cpp_wrapper, test/inductor/test_multi_kernel.py::MultiKernelTest::test_reduction_scratch_buffer_cpp_wrapper_non_persistent_reduction, test/inductor/test_multi_kernel.py::MultiKernelTest::test_reduction_scratch_buffer_cpp_wrapper_persistent_reduction, test/inductor/test_multi_kernel.py::MultiKernelTest::test_softmax, test/inductor/test_multi_kernel.py::MultiKernelTest::test_softmax_cpp_wrapper, test/inductor/test_multi_kernel.py::MultiKernelTest::test_softmax_force_non_persistent_reduction_force_kernel_0, test/inductor/test_multi_kernel.py::MultiKernelTest::test_softmax_force_non_persistent_reduction_force_kernel_1, test/inductor/test_multi_kernel.py::MultiKernelTest::test_softmax_warn_mixed_layout, test/inductor/test_multi_kernel.py::MultiKernelTest::test_sort_disables_multi_kernel, test/inductor/test_multi_kernel.py::MultiKernelTest::test_split_scan, test/inductor/test_multi_kernel.py::MultiKernelTest::test_transformer_snippet, test/inductor/test_multi_kernel.py::MultiKernelTest::test_transformer_snippet_with_fallback_random, test/inductor/test_multi_kernel.py::MultiKernelTest::test_triton_gemm, test/inductor/test_multi_kernel.py::MultiKernelTest::test_triton_relu_fused_gemm 2025-09-07T07:32:26.7283710Z 2025-09-07T07:32:26.7283922Z Running inductor/test_minifier_isolate 1/1 ... [2025-09-07 07:32:26.726938] 2025-09-07T07:32:26.7284310Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:26.7285262Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_minifier_isolate.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:26.727300] 2025-09-07T07:32:29.0331946Z 2025-09-07T07:32:29.0333813Z inductor/test_decompose_mem_bound_mm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_decompose_mem_bound_mm_1.1_7f13cdbe6ce49c91_.log 2025-09-07T07:32:29.0357149Z Running 37 items in this shard: test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_check_device, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_bmm_b_10240_m_2_k_2_n_2_should_decompose_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_bmm_b_10240_m_2_k_32_n_32_should_decompose_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_bmm_b_2000_m_2_k_2_n_2_should_decompose_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_bmm_cpu_b_1_m_2_k_2_n_2_should_decompose_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_bmm_cpu_b_2_m_2_k_2_n_2_should_decompose_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_m_20480_k_32_n_2_should_decompose_False_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_m_20480_k_32_n_2_should_decompose_False_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_m_20480_k_5_n_2_should_decompose_True_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_m_20480_k_5_n_2_should_decompose_True_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_m_2048_k_2_n_2_should_decompose_False_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_m_2048_k_2_n_2_should_decompose_False_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_mixed_precision_m_20480_k_32_n_2_should_decompose_False_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_mixed_precision_m_20480_k_32_n_2_should_decompose_False_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_mixed_precision_m_20480_k_5_n_2_should_decompose_True_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_mixed_precision_m_20480_k_5_n_2_should_decompose_True_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_mixed_precision_m_2048_k_2_n_2_should_decompose_False_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_linear_mixed_precision_m_2048_k_2_n_2_should_decompose_False_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_cpu_m_1_k_64_n_16_should_decompose_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_cpu_m_1_k_64_n_32_should_decompose_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_cpu_m_2_k_64_n_16_should_decompose_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_m_20480_k_32_n_2_should_decompose_False_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_m_20480_k_32_n_2_should_decompose_False_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_m_20480_k_5_n_2_should_decompose_True_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_m_20480_k_5_n_2_should_decompose_True_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_m_2048_k_2_n_2_should_decompose_False_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_m_2048_k_2_n_2_should_decompose_False_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_mixed_precision_m_20480_k_32_n_2_should_decompose_False_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_mixed_precision_m_20480_k_32_n_2_should_decompose_False_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_mixed_precision_m_20480_k_5_n_2_should_decompose_True_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_mixed_precision_m_20480_k_5_n_2_should_decompose_True_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_mixed_precision_m_2048_k_2_n_2_should_decompose_False_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_decompose_mm_mixed_precision_m_2048_k_2_n_2_should_decompose_False_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_dynamic_shape_decompose_addmm, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_dynamic_shape_m_20480_k_5_n_2_should_decompose_True_has_bias_False, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_dynamic_shape_m_20480_k_5_n_2_should_decompose_True_has_bias_True, test/inductor/test_decompose_mem_bound_mm.py::TestDecomposeMemMM::test_realize_input 2025-09-07T07:32:29.0373014Z 2025-09-07T07:32:29.0373171Z Running export/test_swap 1/1 ... [2025-09-07 07:32:29.033310] 2025-09-07T07:32:29.0373519Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:29.0374464Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_swap.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:29.033671] 2025-09-07T07:32:32.0626991Z 2025-09-07T07:32:32.0628191Z inductor/test_block_analysis 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_block_analysis_1.1_bdeaeb9de2c0eda0_.log 2025-09-07T07:32:32.0633707Z Running 10 items in this shard: test/inductor/test_block_analysis.py::BlockAnalysisTest::test_affine_identity_stride_3_symbol2_expr2, test/inductor/test_block_analysis.py::BlockAnalysisTest::test_affine_identity_stride_4_symbol1_expr1, test/inductor/test_block_analysis.py::BlockAnalysisTest::test_affine_identity_stride_5_symbol0_expr0, test/inductor/test_block_analysis.py::BlockAnalysisTest::test_index_with_dynamic_shapes, test/inductor/test_block_analysis.py::BlockAnalysisTest::test_mod_div_identity_dims0_strides0_symbol0_expr0, test/inductor/test_block_analysis.py::BlockAnalysisTest::test_mod_div_identity_dims1_strides1_symbol1_expr1, test/inductor/test_block_analysis.py::BlockAnalysisTest::test_mod_div_identity_dims2_strides2_symbol2_expr2, test/inductor/test_block_analysis.py::BlockAnalysisTest::test_subexpr_identity_symbol0_expr0_subexpr0, test/inductor/test_block_analysis.py::BlockAnalysisTest::test_subexpr_identity_symbol1_expr1_subexpr1, test/inductor/test_block_analysis.py::BlockAnalysisTest::test_subexpr_identity_symbol2_expr2_subexpr2 2025-09-07T07:32:32.0637708Z 2025-09-07T07:32:32.0637900Z Running functorch/test_dims 1/1 ... [2025-09-07 07:32:32.062907] 2025-09-07T07:32:32.0638273Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:32.0639226Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_dims.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:32.063346] 2025-09-07T07:32:32.6533283Z 2025-09-07T07:32:32.6534368Z export/test_swap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_swap_1.1_862812922856a9c2_.log 2025-09-07T07:32:32.6543350Z Running 18 items in this shard: test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_args, test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_kwargs, test/export/test_swap.py::TestSwap_nonstrict::test_custom_output, test/export/test_swap.py::TestSwap_nonstrict::test_dedup_sym_size, test/export/test_swap.py::TestSwap_nonstrict::test_nested_leaf, test/export/test_swap.py::TestSwap_nonstrict::test_remove_duplicate_pytree_different_order, test/export/test_swap.py::TestSwap_nonstrict::test_remove_duplicate_pytree_simple, test/export/test_swap.py::TestSwap_nonstrict::test_unflatten_preserve_signature, test/export/test_swap.py::TestSwap_nonstrict::test_unflatten_preserve_with_unused_input, test/export/test_swap.py::TestSwap_strict::test_custom_input_args, test/export/test_swap.py::TestSwap_strict::test_custom_input_kwargs, test/export/test_swap.py::TestSwap_strict::test_custom_output, test/export/test_swap.py::TestSwap_strict::test_dedup_sym_size, test/export/test_swap.py::TestSwap_strict::test_nested_leaf, test/export/test_swap.py::TestSwap_strict::test_remove_duplicate_pytree_different_order, test/export/test_swap.py::TestSwap_strict::test_remove_duplicate_pytree_simple, test/export/test_swap.py::TestSwap_strict::test_unflatten_preserve_signature, test/export/test_swap.py::TestSwap_strict::test_unflatten_preserve_with_unused_input 2025-09-07T07:32:32.6550207Z 2025-09-07T07:32:32.6550473Z Running profiler/test_profiler 1/1 ... [2025-09-07 07:32:32.653378] 2025-09-07T07:32:32.6551003Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:32.6552240Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_profiler.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:32.653747] 2025-09-07T07:32:36.0339426Z 2025-09-07T07:32:36.0341231Z functorch/test_dims 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_dims_1.1_679ef1231cdee9f9_.log 2025-09-07T07:32:36.0358551Z Running 72 items in this shard: test/functorch/test_dims.py::TestMin::test_adapt, test/functorch/test_dims.py::TestMin::test_attn, test/functorch/test_dims.py::TestMin::test_attn_cuda, test/functorch/test_dims.py::TestMin::test_big_split, test/functorch/test_dims.py::TestMin::test_c, test/functorch/test_dims.py::TestMin::test_compare_dims, test/functorch/test_dims.py::TestMin::test_diag, test/functorch/test_dims.py::TestMin::test_dim_args, test/functorch/test_dims.py::TestMin::test_dims_with_size, test/functorch/test_dims.py::TestMin::test_dir, test/functorch/test_dims.py::TestMin::test_doc, test/functorch/test_dims.py::TestMin::test_embed, test/functorch/test_dims.py::TestMin::test_eq, test/functorch/test_dims.py::TestMin::test_expand, test/functorch/test_dims.py::TestMin::test_functorch, test/functorch/test_dims.py::TestMin::test_hello, test/functorch/test_dims.py::TestMin::test_index, test/functorch/test_dims.py::TestMin::test_index_placement, test/functorch/test_dims.py::TestMin::test_inplace, test/functorch/test_dims.py::TestMin::test_manual_stuff, test/functorch/test_dims.py::TestMin::test_mask, test/functorch/test_dims.py::TestMin::test_max, test/functorch/test_dims.py::TestMin::test_mm, test/functorch/test_dims.py::TestMin::test_mm_fuse, test/functorch/test_dims.py::TestMin::test_monkey, test/functorch/test_dims.py::TestMin::test_network, test/functorch/test_dims.py::TestMin::test_order, test/functorch/test_dims.py::TestMin::test_order_keyword, test/functorch/test_dims.py::TestMin::test_parse, test/functorch/test_dims.py::TestMin::test_permute_orig, test/functorch/test_dims.py::TestMin::test_seg, test/functorch/test_dims.py::TestMin::test_simple, test/functorch/test_dims.py::TestMin::test_softmax_split, test/functorch/test_dims.py::TestMin::test_stack, test/functorch/test_dims.py::TestMin::test_time_mm_fuse, test/functorch/test_dims.py::TestMin::test_with_dims_split, test/functorch/test_dims.py::TestMinFunctorchOnly::test_adapt, test/functorch/test_dims.py::TestMinFunctorchOnly::test_attn, test/functorch/test_dims.py::TestMinFunctorchOnly::test_attn_cuda, test/functorch/test_dims.py::TestMinFunctorchOnly::test_big_split, test/functorch/test_dims.py::TestMinFunctorchOnly::test_c, test/functorch/test_dims.py::TestMinFunctorchOnly::test_compare_dims, test/functorch/test_dims.py::TestMinFunctorchOnly::test_diag, test/functorch/test_dims.py::TestMinFunctorchOnly::test_dim_args, test/functorch/test_dims.py::TestMinFunctorchOnly::test_dims_with_size, test/functorch/test_dims.py::TestMinFunctorchOnly::test_dir, test/functorch/test_dims.py::TestMinFunctorchOnly::test_doc, test/functorch/test_dims.py::TestMinFunctorchOnly::test_embed, test/functorch/test_dims.py::TestMinFunctorchOnly::test_eq, test/functorch/test_dims.py::TestMinFunctorchOnly::test_expand, test/functorch/test_dims.py::TestMinFunctorchOnly::test_functorch, test/functorch/test_dims.py::TestMinFunctorchOnly::test_hello, test/functorch/test_dims.py::TestMinFunctorchOnly::test_index, test/functorch/test_dims.py::TestMinFunctorchOnly::test_index_placement, test/functorch/test_dims.py::TestMinFunctorchOnly::test_inplace, test/functorch/test_dims.py::TestMinFunctorchOnly::test_manual_stuff, test/functorch/test_dims.py::TestMinFunctorchOnly::test_mask, test/functorch/test_dims.py::TestMinFunctorchOnly::test_max, test/functorch/test_dims.py::TestMinFunctorchOnly::test_mm, test/functorch/test_dims.py::TestMinFunctorchOnly::test_mm_fuse, test/functorch/test_dims.py::TestMinFunctorchOnly::test_monkey, test/functorch/test_dims.py::TestMinFunctorchOnly::test_network, test/functorch/test_dims.py::TestMinFunctorchOnly::test_order, test/functorch/test_dims.py::TestMinFunctorchOnly::test_order_keyword, test/functorch/test_dims.py::TestMinFunctorchOnly::test_parse, test/functorch/test_dims.py::TestMinFunctorchOnly::test_permute_orig, test/functorch/test_dims.py::TestMinFunctorchOnly::test_seg, test/functorch/test_dims.py::TestMinFunctorchOnly::test_simple, test/functorch/test_dims.py::TestMinFunctorchOnly::test_softmax_split, test/functorch/test_dims.py::TestMinFunctorchOnly::test_stack, test/functorch/test_dims.py::TestMinFunctorchOnly::test_time_mm_fuse, test/functorch/test_dims.py::TestMinFunctorchOnly::test_with_dims_split 2025-09-07T07:32:36.0372174Z 2025-09-07T07:32:36.0372369Z Running inductor/test_op_dtype_prop 1/1 ... [2025-09-07 07:32:36.034022] 2025-09-07T07:32:36.0372750Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:36.0373664Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_op_dtype_prop.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:36.034394] 2025-09-07T07:32:36.5737751Z 2025-09-07T07:32:36.5738667Z profiler/test_profiler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_profiler_1.1_dc360ebb0e87252e_.log 2025-09-07T07:32:36.5767793Z Running 71 items in this shard: test/profiler/test_profiler.py::TestProfilerCUDA::test_cudagraph_profiling_workaround, test/profiler/test_profiler.py::TestProfilerCUDA::test_custom_module_input_op_ids, test/profiler/test_profiler.py::TestProfilerCUDA::test_mem_leak, test/profiler/test_profiler.py::TestProfilerITT::test_custom_module_input_op_ids, test/profiler/test_profiler.py::TestProfiler::test_basic_chrome_trace, test/profiler/test_profiler.py::TestProfiler::test_basic_profile, test/profiler/test_profiler.py::TestProfiler::test_concrete_inputs_profiling, test/profiler/test_profiler.py::TestProfiler::test_concrete_inputs_profiling_toggling, test/profiler/test_profiler.py::TestProfiler::test_cpu_annotation_overlap, test/profiler/test_profiler.py::TestProfiler::test_disable_external_correlation, test/profiler/test_profiler.py::TestProfiler::test_dynamic_toggle, test/profiler/test_profiler.py::TestProfiler::test_event_list, test/profiler/test_profiler.py::TestProfiler::test_export_stacks, test/profiler/test_profiler.py::TestProfiler::test_flops, test/profiler/test_profiler.py::TestProfiler::test_forked_process, test/profiler/test_profiler.py::TestProfiler::test_guarded_record_function_fast, test/profiler/test_profiler.py::TestProfiler::test_high_level_trace, test/profiler/test_profiler.py::TestProfiler::test_is_profiler_enabled, test/profiler/test_profiler.py::TestProfiler::test_kineto, test/profiler/test_profiler.py::TestProfiler::test_kineto_multigpu, test/profiler/test_profiler.py::TestProfiler::test_kineto_profiler_api, test/profiler/test_profiler.py::TestProfiler::test_kineto_profiler_multiple_steppers, test/profiler/test_profiler.py::TestProfiler::test_kineto_profiler_with_environment_variable, test/profiler/test_profiler.py::TestProfiler::test_lazy_build_tree, test/profiler/test_profiler.py::TestProfiler::test_memory_profiler, test/profiler/test_profiler.py::TestProfiler::test_module_hierarchy, test/profiler/test_profiler.py::TestProfiler::test_nested_tensor_with_shapes, test/profiler/test_profiler.py::TestProfiler::test_oom_tracing, test/profiler/test_profiler.py::TestProfiler::test_override_time_units, test/profiler/test_profiler.py::TestProfiler::test_profile_all_threads, test/profiler/test_profiler.py::TestProfiler::test_profiler_correlation_id, test/profiler/test_profiler.py::TestProfiler::test_profiler_cuda_sync_events, test/profiler/test_profiler.py::TestProfiler::test_profiler_disable_fwd_bwd_link, test/profiler/test_profiler.py::TestProfiler::test_profiler_fwd_bwd_link, test/profiler/test_profiler.py::TestProfiler::test_profiler_metadata, test/profiler/test_profiler.py::TestProfiler::test_profiler_op_event_args, test/profiler/test_profiler.py::TestProfiler::test_profiler_op_event_kwargs, test/profiler/test_profiler.py::TestProfiler::test_profiler_strides, test/profiler/test_profiler.py::TestProfiler::test_profiler_time_scale, test/profiler/test_profiler.py::TestProfiler::test_profiler_tracing, test/profiler/test_profiler.py::TestProfiler::test_profiler_type, test/profiler/test_profiler.py::TestProfiler::test_python_gc_event, test/profiler/test_profiler.py::TestProfiler::test_record_function_fast, test/profiler/test_profiler.py::TestProfiler::test_schedule_function_count, test/profiler/test_profiler.py::TestProfiler::test_skip_first_wait, test/profiler/test_profiler.py::TestProfiler::test_source, test/profiler/test_profiler.py::TestProfiler::test_tensorboard_trace_handler, test/profiler/test_profiler.py::TestProfiler::test_user_annotation, test/profiler/test_profiler.py::TestExperimentalUtils::test_bfs, test/profiler/test_profiler.py::TestExperimentalUtils::test_dfs, test/profiler/test_profiler.py::TestExperimentalUtils::test_fuzz_symbolize, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_conv2d_bias_followed_by_batchnorm2d_pattern, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_debug_autotuner, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_extra_cuda_copy_pattern, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_extra_cuda_copy_pattern_benchmark, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_for_loop_indexing_pattern, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_fp32_matmul_pattern, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_grad_not_set_to_none_pattern, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_matmul_dim_fp16_pattern, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_name_pattern, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_optimizer_single_tensor_pattern, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_overload_names, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_pattern_match_helper, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_pattern_matcher_json_report, test/profiler/test_profiler.py::TestExperimentalUtils::test_profiler_synchronized_dataloader_pattern, test/profiler/test_profiler.py::TestExperimentalUtils::test_utils_compute_idle_time, test/profiler/test_profiler.py::TestExperimentalUtils::test_utils_compute_queue_depth, test/profiler/test_profiler.py::TestExperimentalUtils::test_utils_compute_queue_depth_when_no_cuda_events, test/profiler/test_profiler.py::TestExperimentalUtils::test_utils_compute_self_time, test/profiler/test_profiler.py::TestExperimentalUtils::test_utils_get_optimizable_events, test/profiler/test_profiler.py::TestExperimentalUtils::test_utils_intervals_overlap 2025-09-07T07:32:36.5786620Z 2025-09-07T07:32:36.5786804Z Running test_tensorexpr_pybind 1/1 ... [2025-09-07 07:32:36.573848] 2025-09-07T07:32:36.5787172Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:36.5788072Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_tensorexpr_pybind.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:36.574209] 2025-09-07T07:32:40.3440990Z 2025-09-07T07:32:40.3442246Z test_tensorexpr_pybind 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_tensorexpr_pybind_1.1_675831dbcc69d36d_.log 2025-09-07T07:32:40.3450071Z Running 17 items in this shard: test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_alloc_in_loop, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_call_raw, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_dtype_error, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_dynamic_shape, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_dynamic_shape_2d, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_external_calls, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_shape_prop, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_shape_prop_module, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_custom_lowering, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_expand, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_permute, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_scalar_inputs, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_t, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_tensor_inputs, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_transpose, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_simple_sum, test/test_tensorexpr_pybind.py::TestExprHandlePyBind::test_unary_ops 2025-09-07T07:32:40.3455450Z 2025-09-07T07:32:40.3455690Z Running inductor/test_split_cat_fx_aten_passes 1/1 ... [2025-09-07 07:32:40.344102] 2025-09-07T07:32:40.3456122Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:40.3457136Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_split_cat_fx_aten_passes.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:40.344457] 2025-09-07T07:32:45.8125490Z 2025-09-07T07:32:45.8126688Z inductor/test_op_dtype_prop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_op_dtype_prop_1.1_bf1369c2a4489db2_.log 2025-09-07T07:32:45.8320928Z Running 567 items in this shard: test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_any_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_assoc_scan_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_binary_math_mixed_precision_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_codegen_upcast_to_fp32_upcast_to_fp32_False_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_codegen_upcast_to_fp32_upcast_to_fp32_True_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_constant_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_downcast_div_mod_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_abs_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_abs_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_abs_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_abs_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_acos_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_acos_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_acos_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_acos_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_acosh_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_acosh_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_acosh_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_acosh_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_asin_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_asin_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_asin_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_asin_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_asinh_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_asinh_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_asinh_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_asinh_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan2_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan2_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan2_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan2_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atanh_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atanh_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atanh_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atanh_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_ceil_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_ceil_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_ceil_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_ceil_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_copysign_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_copysign_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_copysign_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_copysign_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cos_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cos_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cos_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cos_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cosh_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cosh_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cosh_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cosh_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erf_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erf_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erf_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erf_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erfc_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erfc_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erfc_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erfc_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erfinv_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erfinv_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erfinv_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erfinv_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_exp2_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_exp2_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_exp2_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_exp2_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_exp_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_exp_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_exp_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_exp_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_expm1_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_expm1_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_expm1_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_expm1_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_floor_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_floor_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_floor_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_floor_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_fmod_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_fmod_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_fmod_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_fmod_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_hypot_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_hypot_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_hypot_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_hypot_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isinf_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isinf_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isinf_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isinf_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isnan_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isnan_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isnan_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isnan_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_lgamma_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_lgamma_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_lgamma_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_lgamma_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log10_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log10_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log10_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log10_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log1p_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log1p_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log1p_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log1p_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log2_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log2_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log2_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log2_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_nextafter_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_nextafter_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_nextafter_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_nextafter_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_pow_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_pow_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_pow_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_pow_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_round_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_round_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_round_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_round_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_rsqrt_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_rsqrt_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_rsqrt_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_rsqrt_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sigmoid_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sigmoid_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sigmoid_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sigmoid_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sin_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sin_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sin_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sin_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sinh_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sinh_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sinh_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sinh_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sqrt_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sqrt_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sqrt_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sqrt_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_tan_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_tan_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_tan_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_tan_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_tanh_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_tanh_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_tanh_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_tanh_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_trunc_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_trunc_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_trunc_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_trunc_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_abs_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_abs_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_abs_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_abs_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_abs_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acos_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acos_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acos_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acos_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acos_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acosh_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acosh_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acosh_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acosh_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acosh_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_add_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_add_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_add_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_add_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_add_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_angle_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_angle_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_angle_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_angle_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_angle_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asin_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asin_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asin_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asin_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asin_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asinh_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asinh_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asinh_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asinh_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asinh_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan2_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan2_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan2_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan2_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan2_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atanh_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atanh_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atanh_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atanh_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atanh_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_and_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_and_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_and_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_left_shift_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_left_shift_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_not_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_not_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_not_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_or_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_or_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_or_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_right_shift_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_right_shift_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_xor_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_xor_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_xor_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ceil_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ceil_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ceil_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ceil_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_max_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_max_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_max_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_max_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_max_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_min_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_min_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_min_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_min_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_min_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clone_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clone_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clone_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clone_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clone_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_copysign_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_copysign_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_copysign_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_copysign_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_copysign_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cos_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cos_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cos_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cos_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cos_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cosh_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cosh_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cosh_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cosh_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cosh_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_digamma_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_digamma_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_digamma_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_digamma_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_digamma_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_floor_rounding_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_floor_rounding_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_floor_rounding_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_floor_rounding_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_no_rounding_mode_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_no_rounding_mode_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_no_rounding_mode_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_no_rounding_mode_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_no_rounding_mode_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_trunc_rounding_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_trunc_rounding_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_trunc_rounding_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_trunc_rounding_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_eq_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_eq_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_eq_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_eq_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_eq_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erf_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erf_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erf_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erf_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erf_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfc_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfc_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfc_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfc_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfc_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfinv_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfinv_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfinv_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfinv_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfinv_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp2_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp2_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp2_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp2_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp2_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_expm1_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_expm1_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_expm1_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_expm1_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_expm1_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_floor_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_floor_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_floor_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_floor_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_fmod_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_fmod_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_fmod_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_fmod_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_frexp_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_frexp_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_gcd_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_gcd_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ge_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ge_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ge_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ge_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ge_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_gt_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_gt_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_gt_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_gt_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_gt_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_hypot_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_hypot_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_i0_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_i0_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_i0_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_i0_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_i0_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_igamma_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_igamma_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_igammac_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_igammac_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isinf_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isinf_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isinf_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isinf_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isinf_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isnan_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isnan_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isnan_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isnan_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isnan_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_le_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_le_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_le_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_le_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_le_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lgamma_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lgamma_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lgamma_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lgamma_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lgamma_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log10_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log10_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log10_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log10_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log10_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log1p_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log1p_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log1p_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log1p_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log1p_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log2_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log2_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log2_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log2_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log2_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_and_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_and_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_and_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_and_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_and_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_not_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_not_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_not_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_not_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_not_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_or_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_or_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_or_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_or_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_or_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_xor_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_xor_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_xor_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_xor_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_xor_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lt_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lt_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lt_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lt_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lt_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_max_binary_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_max_binary_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_max_binary_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_max_binary_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_max_binary_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_maximum_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_maximum_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_maximum_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_maximum_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_maximum_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_min_binary_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_min_binary_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_min_binary_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_min_binary_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_min_binary_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_minimum_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_minimum_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_minimum_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_minimum_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_minimum_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_mul_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_mul_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_mul_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_mul_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_mul_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ne_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ne_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ne_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ne_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ne_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_neg_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_neg_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_neg_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_neg_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_nextafter_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_nextafter_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_0_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_0_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_0_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_0_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_0_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_1_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_1_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_1_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_1_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_1_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_2_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_2_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_2_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_2_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_2_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_3_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_3_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_3_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_3_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_3_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_4_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_4_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_4_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_4_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_4_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_pow_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_pow_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_pow_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_pow_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_reciprocal_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_reciprocal_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_reciprocal_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_reciprocal_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_reciprocal_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_remainder_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_remainder_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_remainder_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_remainder_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_decimals_0_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_decimals_0_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_decimals_3_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_decimals_3_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_decimals_neg_3_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_decimals_neg_3_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_rsqrt_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_rsqrt_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_rsqrt_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_rsqrt_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_rsqrt_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sigmoid_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sigmoid_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sigmoid_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sigmoid_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sigmoid_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sign_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sign_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sign_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sign_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sign_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_signbit_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_signbit_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_signbit_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_signbit_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_signbit_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sin_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sin_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sin_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sin_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sin_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sinh_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sinh_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sinh_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sinh_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sinh_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sqrt_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sqrt_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sqrt_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sqrt_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sqrt_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_square_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_square_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_square_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_square_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_square_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sub_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sub_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sub_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sub_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tan_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tan_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tan_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tan_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tan_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tanh_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tanh_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tanh_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tanh_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tanh_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_true_divide_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_true_divide_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_true_divide_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_true_divide_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_true_divide_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_trunc_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_trunc_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_trunc_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_trunc_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_where_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_where_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_where_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_where_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_where_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_support_cuda 2025-09-07T07:32:45.8502437Z 2025-09-07T07:32:45.8502610Z Running dynamo/test_misc 1/1 ... [2025-09-07 07:32:45.813520] 2025-09-07T07:32:45.8502957Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:45.8503840Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_misc.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:45.813865] 2025-09-07T07:32:47.2682999Z 2025-09-07T07:32:47.2684212Z inductor/test_split_cat_fx_aten_passes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_split_cat_fx_aten_passes_1.1_eff874e16d505fcd_.log 2025-09-07T07:32:47.2687251Z Running 5 items in this shard: test/inductor/test_split_cat_fx_aten_passes.py::TestSplitCatAten::test_move_view_after_cat_aten, test/inductor/test_split_cat_fx_aten_passes.py::TestSplitCatAten::test_select_cat_post_grad, test/inductor/test_split_cat_fx_aten_passes.py::TestSplitCatAten::test_split_cat_post_grad, test/inductor/test_split_cat_fx_aten_passes.py::TestSplitCatAten::test_split_cat_post_grad_singular, test/inductor/test_split_cat_fx_aten_passes.py::TestSplitCatAtenNormalizationPasses::test_split_aten_normalization 2025-09-07T07:32:47.2693099Z 2025-09-07T07:32:47.2693336Z Running inductor/test_loop_ordering 1/1 ... [2025-09-07 07:32:47.268335] 2025-09-07T07:32:47.2693783Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:47.2694999Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_loop_ordering.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:47.268709] 2025-09-07T07:32:52.3377287Z 2025-09-07T07:32:52.3378575Z dynamo/test_misc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_misc_1.1_97a2e063eba4cbfe_.log 2025-09-07T07:32:52.3512089Z Running 610 items in this shard: test/dynamo/test_misc.py::MiscTests::test_312_binary_slice_with_graph_break1, test/dynamo/test_misc.py::MiscTests::test_312_binary_slice_with_graph_break2, test/dynamo/test_misc.py::MiscTests::test_RAISE_VARARGS_0, test/dynamo/test_misc.py::MiscTests::test_T_tensor_attribute, test/dynamo/test_misc.py::MiscTests::test_add_sizes, test/dynamo/test_misc.py::MiscTests::test_add_to_set, test/dynamo/test_misc.py::MiscTests::test_anomaly_aot_autograd, test/dynamo/test_misc.py::MiscTests::test_any_all_symnode, test/dynamo/test_misc.py::MiscTests::test_aot_autograd_propagate_unbacked_symints_shape, test/dynamo/test_misc.py::MiscTests::test_arange_length_with_float32_dtype, test/dynamo/test_misc.py::MiscTests::test_argwhere_with_dynamic_shapes, test/dynamo/test_misc.py::MiscTests::test_assert, test/dynamo/test_misc.py::MiscTests::test_assert_size_stride, test/dynamo/test_misc.py::MiscTests::test_assigning_function_to_class_attribute, test/dynamo/test_misc.py::MiscTests::test_assigning_function_to_object_attribute, test/dynamo/test_misc.py::MiscTests::test_backend_match_guard, test/dynamo/test_misc.py::MiscTests::test_backend_match_guard_multi_threads, test/dynamo/test_misc.py::MiscTests::test_backward_deterministic_mode_mismatch_warning, test/dynamo/test_misc.py::MiscTests::test_boolarg, test/dynamo/test_misc.py::MiscTests::test_bound_shape_checks, test/dynamo/test_misc.py::MiscTests::test_build_tuple_unpack, test/dynamo/test_misc.py::MiscTests::test_builder_for_class_with_metaclass, test/dynamo/test_misc.py::MiscTests::test_builtin_abs, test/dynamo/test_misc.py::MiscTests::test_builtin_bool_on_symbool, test/dynamo/test_misc.py::MiscTests::test_builtin_bool_on_symfloat, test/dynamo/test_misc.py::MiscTests::test_builtin_bool_on_symint, test/dynamo/test_misc.py::MiscTests::test_builtin_complex, test/dynamo/test_misc.py::MiscTests::test_builtin_complex_args, test/dynamo/test_misc.py::MiscTests::test_builtin_isinstance, test/dynamo/test_misc.py::MiscTests::test_builtin_str_on_user_defined_function, test/dynamo/test_misc.py::MiscTests::test_builtin_subclasses_as_method_on_class_type, test/dynamo/test_misc.py::MiscTests::test_builtin_subclasses_as_method_on_var, test/dynamo/test_misc.py::MiscTests::test_call_parent_non_class_methods_from_child, test/dynamo/test_misc.py::MiscTests::test_callpacked, test/dynamo/test_misc.py::MiscTests::test_cannot_trace_mark_dynamic, test/dynamo/test_misc.py::MiscTests::test_cannot_trace_mark_dynamic_safe_unreached, test/dynamo/test_misc.py::MiscTests::test_cast, test/dynamo/test_misc.py::MiscTests::test_cat_unbacked, test/dynamo/test_misc.py::MiscTests::test_catch_watchings1, test/dynamo/test_misc.py::MiscTests::test_catch_watchings2, test/dynamo/test_misc.py::MiscTests::test_cell_captured_by_existing_func_but_not_root_frame, test/dynamo/test_misc.py::MiscTests::test_cell_output1, test/dynamo/test_misc.py::MiscTests::test_cell_output2, test/dynamo/test_misc.py::MiscTests::test_class_binop, test/dynamo/test_misc.py::MiscTests::test_class_duner_flags, test/dynamo/test_misc.py::MiscTests::test_class_duner_mro, test/dynamo/test_misc.py::MiscTests::test_class_has_instancecheck_method, test/dynamo/test_misc.py::MiscTests::test_clone_sparse_input, test/dynamo/test_misc.py::MiscTests::test_closure_out_of_scope_cell, test/dynamo/test_misc.py::MiscTests::test_closure_out_of_scope_cell_with_cond, test/dynamo/test_misc.py::MiscTests::test_closure_out_of_scope_cell_with_mutation, test/dynamo/test_misc.py::MiscTests::test_closure_recompiles, test/dynamo/test_misc.py::MiscTests::test_closure_with_mutation_and_graph_break, test/dynamo/test_misc.py::MiscTests::test_closure_write_across_functions, test/dynamo/test_misc.py::MiscTests::test_compare_shapes_eq, test/dynamo/test_misc.py::MiscTests::test_compare_shapes_neq, test/dynamo/test_misc.py::MiscTests::test_compare_shapes_tuple_eq, test/dynamo/test_misc.py::MiscTests::test_compare_shapes_tuple_neq, test/dynamo/test_misc.py::MiscTests::test_compare_shapes_with_constant, test/dynamo/test_misc.py::MiscTests::test_compare_tensor_with_none, test/dynamo/test_misc.py::MiscTests::test_compilation_metrics_size_limit, test/dynamo/test_misc.py::MiscTests::test_cond, test/dynamo/test_misc.py::MiscTests::test_cond_export, test/dynamo/test_misc.py::MiscTests::test_cond_export_single_arg, test/dynamo/test_misc.py::MiscTests::test_cond_nested, test/dynamo/test_misc.py::MiscTests::test_cond_side_effects, test/dynamo/test_misc.py::MiscTests::test_cond_with_quantization, test/dynamo/test_misc.py::MiscTests::test_conditional_list_comp_in_context, test/dynamo/test_misc.py::MiscTests::test_config_getattr_default, test/dynamo/test_misc.py::MiscTests::test_config_obj, test/dynamo/test_misc.py::MiscTests::test_const_dict_variable_python_type, test/dynamo/test_misc.py::MiscTests::test_constant_getattr, test/dynamo/test_misc.py::MiscTests::test_cross_entropy_loss_fancy_ctor1, test/dynamo/test_misc.py::MiscTests::test_cross_entropy_loss_fancy_ctor2, test/dynamo/test_misc.py::MiscTests::test_cross_entropy_loss_simple_ctor, test/dynamo/test_misc.py::MiscTests::test_custom_dict, test/dynamo/test_misc.py::MiscTests::test_custom_module_free, test/dynamo/test_misc.py::MiscTests::test_data_access_in_inference_mode, test/dynamo/test_misc.py::MiscTests::test_data_ptr_graph_break_aten, test/dynamo/test_misc.py::MiscTests::test_data_ptr_graph_break_builtin, test/dynamo/test_misc.py::MiscTests::test_dataclass, test/dynamo/test_misc.py::MiscTests::test_dataclass_fields, test/dynamo/test_misc.py::MiscTests::test_dataclass_local_hasattr, test/dynamo/test_misc.py::MiscTests::test_default_args_device_dtype, test/dynamo/test_misc.py::MiscTests::test_default_dtype_change, test/dynamo/test_misc.py::MiscTests::test_defaultdict, test/dynamo/test_misc.py::MiscTests::test_deque_append_left, test/dynamo/test_misc.py::MiscTests::test_deque_input, test/dynamo/test_misc.py::MiscTests::test_derpy_nn_module_usage, test/dynamo/test_misc.py::MiscTests::test_descriptor, test/dynamo/test_misc.py::MiscTests::test_descriptor_side_effect, test/dynamo/test_misc.py::MiscTests::test_deterministic_algorithms_mutated, test/dynamo/test_misc.py::MiscTests::test_dictcomp, test/dynamo/test_misc.py::MiscTests::test_disable_flag, test/dynamo/test_misc.py::MiscTests::test_dtypes_no_graphbreaks, test/dynamo/test_misc.py::MiscTests::test_dunder_methods, test/dynamo/test_misc.py::MiscTests::test_dunder_new_function_inlining, test/dynamo/test_misc.py::MiscTests::test_dunder_new_function_inlining1, test/dynamo/test_misc.py::MiscTests::test_dunder_new_function_inlining2, test/dynamo/test_misc.py::MiscTests::test_dunder_new_function_inlining3, test/dynamo/test_misc.py::MiscTests::test_dunder_new_function_inlining4, test/dynamo/test_misc.py::MiscTests::test_dunder_weakref, test/dynamo/test_misc.py::MiscTests::test_duplicate_graph_break_log, test/dynamo/test_misc.py::MiscTests::test_dynamic_one_hot, test/dynamo/test_misc.py::MiscTests::test_dynamic_shapes_as_strided, test/dynamo/test_misc.py::MiscTests::test_dynamic_sources_dynamic_override, test/dynamo/test_misc.py::MiscTests::test_dynamic_sources_dynamic_override_regex, test/dynamo/test_misc.py::MiscTests::test_dynamic_sources_force_parameter_static_shapes_and_property_static_shapes_override, test/dynamo/test_misc.py::MiscTests::test_dynamic_sources_graph_break, test/dynamo/test_misc.py::MiscTests::test_dynamic_sources_int, test/dynamo/test_misc.py::MiscTests::test_dynamic_sources_precedence_over_int_specialization, test/dynamo/test_misc.py::MiscTests::test_dynamic_sources_tensor, test/dynamo/test_misc.py::MiscTests::test_dynamo_cache_invalidate, test/dynamo/test_misc.py::MiscTests::test_dynamo_cache_move_to_front, test/dynamo/test_misc.py::MiscTests::test_dynamo_compiling_fake_tensor_to_vararg_int, test/dynamo/test_misc.py::MiscTests::test_dynamo_disabled_in_custom_op_kernels, test/dynamo/test_misc.py::MiscTests::test_dynamo_min_operator_with_shape, test/dynamo/test_misc.py::MiscTests::test_dynamo_reset_clears_cache, test/dynamo/test_misc.py::MiscTests::test_empty_list, test/dynamo/test_misc.py::MiscTests::test_enum_as_dict_key, test/dynamo/test_misc.py::MiscTests::test_enum_as_dict_key_with_overloaded_str, test/dynamo/test_misc.py::MiscTests::test_enum_guards, test/dynamo/test_misc.py::MiscTests::test_enum_method, test/dynamo/test_misc.py::MiscTests::test_enum_no_graphbreaks, test/dynamo/test_misc.py::MiscTests::test_enum_subclass, test/dynamo/test_misc.py::MiscTests::test_error_on_nested_fx_trace, test/dynamo/test_misc.py::MiscTests::test_error_on_recompile, test/dynamo/test_misc.py::MiscTests::test_escaping_closure_var_with_backward_hook, test/dynamo/test_misc.py::MiscTests::test_escaping_closure_var_with_nonlocal_var, test/dynamo/test_misc.py::MiscTests::test_existing_func_that_creates_capturing_nested_func, test/dynamo/test_misc.py::MiscTests::test_fail_on_recompile_error_message, test/dynamo/test_misc.py::MiscTests::test_flat_name_to_original_fqn, test/dynamo/test_misc.py::MiscTests::test_float_speculation_log_divergence, test/dynamo/test_misc.py::MiscTests::test_fn_hasattr__name__1, test/dynamo/test_misc.py::MiscTests::test_fn_hasattr__name__2, test/dynamo/test_misc.py::MiscTests::test_fn_hasattr__name__3, test/dynamo/test_misc.py::MiscTests::test_fold, test/dynamo/test_misc.py::MiscTests::test_free_var_and_local_name_collision, test/dynamo/test_misc.py::MiscTests::test_frozen_dataclass_attr_access, test/dynamo/test_misc.py::MiscTests::test_frozen_dataclass_default_factory, test/dynamo/test_misc.py::MiscTests::test_frozen_dataclass_default_value, test/dynamo/test_misc.py::MiscTests::test_frozen_dataclass_hashable, test/dynamo/test_misc.py::MiscTests::test_frozen_dataclass_kw_only, test/dynamo/test_misc.py::MiscTests::test_frozen_dict, test/dynamo/test_misc.py::MiscTests::test_frozenset_of_non_literals, test/dynamo/test_misc.py::MiscTests::test_frozenset_torch_func_contains, test/dynamo/test_misc.py::MiscTests::test_fullgraph_capture, test/dynamo/test_misc.py::MiscTests::test_funcname_cache, test/dynamo/test_misc.py::MiscTests::test_function_annotation, test/dynamo/test_misc.py::MiscTests::test_function_generic_alias_annotation, test/dynamo/test_misc.py::MiscTests::test_generate_tensor_from_list_of_numpy_primitive_type, test/dynamo/test_misc.py::MiscTests::test_generate_trivial_abstract_impl, test/dynamo/test_misc.py::MiscTests::test_get_attr_function, test/dynamo/test_misc.py::MiscTests::test_get_cache_entry, test/dynamo/test_misc.py::MiscTests::test_get_custom_tensor_attribute, test/dynamo/test_misc.py::MiscTests::test_get_instruction_source_311, test/dynamo/test_misc.py::MiscTests::test_getattr_dict, test/dynamo/test_misc.py::MiscTests::test_getattrvariable_as_python_constant, test/dynamo/test_misc.py::MiscTests::test_getset_descriptor, test/dynamo/test_misc.py::MiscTests::test_global_state_guard_serialization, test/dynamo/test_misc.py::MiscTests::test_grad, test/dynamo/test_misc.py::MiscTests::test_grad_non_none, test/dynamo/test_misc.py::MiscTests::test_grad_none, test/dynamo/test_misc.py::MiscTests::test_grad_state_mutated, test/dynamo/test_misc.py::MiscTests::test_graph_break_compilation_metrics, test/dynamo/test_misc.py::MiscTests::test_graph_break_compilation_metrics_on_failure, test/dynamo/test_misc.py::MiscTests::test_graph_break_correctly_when_passing_numpy_ndarray_to_torch_function, test/dynamo/test_misc.py::MiscTests::test_guard_failure_fn, test/dynamo/test_misc.py::MiscTests::test_guard_failure_fn2, test/dynamo/test_misc.py::MiscTests::test_guard_failure_fn_shape_control, test/dynamo/test_misc.py::MiscTests::test_guard_failure_fn_tensor_iter, test/dynamo/test_misc.py::MiscTests::test_guard_filter_fn_by_id, test/dynamo/test_misc.py::MiscTests::test_guard_filter_fn_by_is_global, test/dynamo/test_misc.py::MiscTests::test_guard_filter_fn_by_name_and_value, test/dynamo/test_misc.py::MiscTests::test_guard_filter_globals, test/dynamo/test_misc.py::MiscTests::test_guard_filter_inbuilt_nn_modules, test/dynamo/test_misc.py::MiscTests::test_guard_filter_nn_modules, test/dynamo/test_misc.py::MiscTests::test_guard_filter_tensors, test/dynamo/test_misc.py::MiscTests::test_guard_function_builder_with_cse, test/dynamo/test_misc.py::MiscTests::test_guard_size_oblivious, test/dynamo/test_misc.py::MiscTests::test_guard_size_oblivious_backed, test/dynamo/test_misc.py::MiscTests::test_guard_size_oblivious_simplification, test/dynamo/test_misc.py::MiscTests::test_guard_sym_node_fstring_when_used, test/dynamo/test_misc.py::MiscTests::test_guards_cse_pass_multiple, test/dynamo/test_misc.py::MiscTests::test_guards_cse_pass_single, test/dynamo/test_misc.py::MiscTests::test_guards_strip_function_call, test/dynamo/test_misc.py::MiscTests::test_hasattr_nn_module_guard, test/dynamo/test_misc.py::MiscTests::test_hash_getitem_slice, test/dynamo/test_misc.py::MiscTests::test_hash_hop, test/dynamo/test_misc.py::MiscTests::test_id_guarded_class, test/dynamo/test_misc.py::MiscTests::test_id_guarded_module, test/dynamo/test_misc.py::MiscTests::test_id_guarded_object, test/dynamo/test_misc.py::MiscTests::test_id_of_nn_module, test/dynamo/test_misc.py::MiscTests::test_id_tensor, test/dynamo/test_misc.py::MiscTests::test_if_cond_nn_mod1, test/dynamo/test_misc.py::MiscTests::test_if_cond_nn_mod2, test/dynamo/test_misc.py::MiscTests::test_if_cond_nn_mod3, test/dynamo/test_misc.py::MiscTests::test_if_cond_user_defined_object, test/dynamo/test_misc.py::MiscTests::test_if_cond_user_defined_object2, test/dynamo/test_misc.py::MiscTests::test_if_cond_user_defined_object3, test/dynamo/test_misc.py::MiscTests::test_inference_mode, test/dynamo/test_misc.py::MiscTests::test_inference_mode_param, test/dynamo/test_misc.py::MiscTests::test_inline_closure_not_loaded_by_parent, test/dynamo/test_misc.py::MiscTests::test_inline_closure_returned_by_another_function_and_captures, test/dynamo/test_misc.py::MiscTests::test_inline_dict_function, test/dynamo/test_misc.py::MiscTests::test_inline_dict_function_passed_as_arg, test/dynamo/test_misc.py::MiscTests::test_inline_dict_mutation, test/dynamo/test_misc.py::MiscTests::test_inline_func_jump_on_tensor_condition, test/dynamo/test_misc.py::MiscTests::test_inline_list_mutation, test/dynamo/test_misc.py::MiscTests::test_inline_local_dict_clear, test/dynamo/test_misc.py::MiscTests::test_inline_module_attr_dict_clear, test/dynamo/test_misc.py::MiscTests::test_inline_user_defined_dict_attr_clear, test/dynamo/test_misc.py::MiscTests::test_inplace, test/dynamo/test_misc.py::MiscTests::test_inplace_desugaring, test/dynamo/test_misc.py::MiscTests::test_inplace_param_update, test/dynamo/test_misc.py::MiscTests::test_inplace_view_on_graph_input, test/dynamo/test_misc.py::MiscTests::test_input_cell_mutation, test/dynamo/test_misc.py::MiscTests::test_inspect_signature_bind, test/dynamo/test_misc.py::MiscTests::test_inspect_signature_bind_non_user_function, test/dynamo/test_misc.py::MiscTests::test_inspect_signature_parameters, test/dynamo/test_misc.py::MiscTests::test_int_int_comparisons, test/dynamo/test_misc.py::MiscTests::test_int_list, test/dynamo/test_misc.py::MiscTests::test_int_neg, test/dynamo/test_misc.py::MiscTests::test_int_shape_binops, test/dynamo/test_misc.py::MiscTests::test_int_shape_comparisons, test/dynamo/test_misc.py::MiscTests::test_int_shape_inplace_binops, test/dynamo/test_misc.py::MiscTests::test_intermediary_tensor_grad_access, test/dynamo/test_misc.py::MiscTests::test_invalid_args_builtin, test/dynamo/test_misc.py::MiscTests::test_is_compiling, test/dynamo/test_misc.py::MiscTests::test_is_floating_point, test/dynamo/test_misc.py::MiscTests::test_is_floating_point2, test/dynamo/test_misc.py::MiscTests::test_is_tensor, test/dynamo/test_misc.py::MiscTests::test_is_tensor2, test/dynamo/test_misc.py::MiscTests::test_is_tensor_like, test/dynamo/test_misc.py::MiscTests::test_is_tensor_like2, test/dynamo/test_misc.py::MiscTests::test_item, test/dynamo/test_misc.py::MiscTests::test_item_changes, test/dynamo/test_misc.py::MiscTests::test_item_changes_new_shape, test/dynamo/test_misc.py::MiscTests::test_iter_set, test/dynamo/test_misc.py::MiscTests::test_iter_type, test/dynamo/test_misc.py::MiscTests::test_iterator_limit, test/dynamo/test_misc.py::MiscTests::test_itertools_accumulate_symint_default_sum, test/dynamo/test_misc.py::MiscTests::test_itertools_accumulate_tensors_builtins, test/dynamo/test_misc.py::MiscTests::test_itertools_accumulate_tensors_default_sum, test/dynamo/test_misc.py::MiscTests::test_itertools_accumulate_tensors_kwargs, test/dynamo/test_misc.py::MiscTests::test_itertools_accumulate_tensors_user_defined, test/dynamo/test_misc.py::MiscTests::test_itertools_groupby_pure_python_default_identify_func, test/dynamo/test_misc.py::MiscTests::test_itertools_groupby_pure_python_key_func, test/dynamo/test_misc.py::MiscTests::test_itertools_infinite_count, test/dynamo/test_misc.py::MiscTests::test_itertools_infinite_cycle, test/dynamo/test_misc.py::MiscTests::test_itertools_infinite_repeat, test/dynamo/test_misc.py::MiscTests::test_itertools_infinite_repeat_mutation, test/dynamo/test_misc.py::MiscTests::test_itertools_islice, test/dynamo/test_misc.py::MiscTests::test_itertools_islice_default_end, test/dynamo/test_misc.py::MiscTests::test_itertools_islice_default_step, test/dynamo/test_misc.py::MiscTests::test_itertools_repeat, test/dynamo/test_misc.py::MiscTests::test_itertools_tee, test/dynamo/test_misc.py::MiscTests::test_large_reduction_list, test/dynamo/test_misc.py::MiscTests::test_linear_module_free, test/dynamo/test_misc.py::MiscTests::test_list_append_return_none, test/dynamo/test_misc.py::MiscTests::test_list_class, test/dynamo/test_misc.py::MiscTests::test_list_hasattr1, test/dynamo/test_misc.py::MiscTests::test_list_hasattr2, test/dynamo/test_misc.py::MiscTests::test_list_iadd_side_effect, test/dynamo/test_misc.py::MiscTests::test_list_iadd_with_shape, test/dynamo/test_misc.py::MiscTests::test_list_iterator_contains, test/dynamo/test_misc.py::MiscTests::test_list_mul, test/dynamo/test_misc.py::MiscTests::test_list_slice_mul, test/dynamo/test_misc.py::MiscTests::test_listcomp, test/dynamo/test_misc.py::MiscTests::test_load_fast_and_clear_graph_break, test/dynamo/test_misc.py::MiscTests::test_mandelbrot_numpy, test/dynamo/test_misc.py::MiscTests::test_map_side_effects, test/dynamo/test_misc.py::MiscTests::test_map_with_quantization, test/dynamo/test_misc.py::MiscTests::test_mark_dynamic_with_ranges, test/dynamo/test_misc.py::MiscTests::test_mark_static, test/dynamo/test_misc.py::MiscTests::test_mark_unbacked_strict, test/dynamo/test_misc.py::MiscTests::test_matmul1, test/dynamo/test_misc.py::MiscTests::test_min_max_over_iterable, test/dynamo/test_misc.py::MiscTests::test_module_complex_iter, test/dynamo/test_misc.py::MiscTests::test_module_deepcopy, test/dynamo/test_misc.py::MiscTests::test_module_not_callable, test/dynamo/test_misc.py::MiscTests::test_mro_type_tensor_no_source, test/dynamo/test_misc.py::MiscTests::test_multiple_inheritance, test/dynamo/test_misc.py::MiscTests::test_mutable_mapping_multiple_inheritance, test/dynamo/test_misc.py::MiscTests::test_named_parameters, test/dynamo/test_misc.py::MiscTests::test_namedtuple1, test/dynamo/test_misc.py::MiscTests::test_namedtuple2, test/dynamo/test_misc.py::MiscTests::test_namedtuple3, test/dynamo/test_misc.py::MiscTests::test_namedtuple_class, test/dynamo/test_misc.py::MiscTests::test_namedtuple_with_custom_getitem, test/dynamo/test_misc.py::MiscTests::test_nan, test/dynamo/test_misc.py::MiscTests::test_ne_operator_with_custom_eq, test/dynamo/test_misc.py::MiscTests::test_ne_operator_with_custom_graphbreak_eq, test/dynamo/test_misc.py::MiscTests::test_ne_operator_with_custom_ne, test/dynamo/test_misc.py::MiscTests::test_nested_closure, test/dynamo/test_misc.py::MiscTests::test_nested_closure_mutation, test/dynamo/test_misc.py::MiscTests::test_nested_dataclass_reconstruct, test/dynamo/test_misc.py::MiscTests::test_nested_frozen_dataclass_hashable, test/dynamo/test_misc.py::MiscTests::test_nested_function_resuming_with_correct_globals, test/dynamo/test_misc.py::MiscTests::test_nested_optimize, test/dynamo/test_misc.py::MiscTests::test_nested_optimize_decorator, test/dynamo/test_misc.py::MiscTests::test_nested_optimize_run, test/dynamo/test_misc.py::MiscTests::test_nested_sequential_try, test/dynamo/test_misc.py::MiscTests::test_nested_sequential_try_with, test/dynamo/test_misc.py::MiscTests::test_nested_sequential_try_with_graph_break, test/dynamo/test_misc.py::MiscTests::test_nested_sequential_with, test/dynamo/test_misc.py::MiscTests::test_nested_wraps, test/dynamo/test_misc.py::MiscTests::test_nesteduserfunction_setattr, test/dynamo/test_misc.py::MiscTests::test_new_with_int_list, test/dynamo/test_misc.py::MiscTests::test_newly_constructed_tensor_attr_mutation, test/dynamo/test_misc.py::MiscTests::test_nn_functional_reduction, test/dynamo/test_misc.py::MiscTests::test_nn_module_getattr, test/dynamo/test_misc.py::MiscTests::test_nn_module_getattribute, test/dynamo/test_misc.py::MiscTests::test_nn_sequential_invocation, test/dynamo/test_misc.py::MiscTests::test_nn_sequential_invocation_reposition_indices, test/dynamo/test_misc.py::MiscTests::test_no_error_on_nested_fx_trace, test/dynamo/test_misc.py::MiscTests::test_no_guard_for_unused_sym_node_fstring, test/dynamo/test_misc.py::MiscTests::test_no_raise_guard_partial_constraint, test/dynamo/test_misc.py::MiscTests::test_no_raise_guard_partial_constraint_across_break, test/dynamo/test_misc.py::MiscTests::test_non_pt2_compliant_ops_graph_break, test/dynamo/test_misc.py::MiscTests::test_not_dynamic_scope, test/dynamo/test_misc.py::MiscTests::test_numel, test/dynamo/test_misc.py::MiscTests::test_numpy_array_of_arrays, test/dynamo/test_misc.py::MiscTests::test_numpy_as_global, test/dynamo/test_misc.py::MiscTests::test_numpy_fallback_on_eager, test/dynamo/test_misc.py::MiscTests::test_numpy_force, test/dynamo/test_misc.py::MiscTests::test_numpy_gt, test/dynamo/test_misc.py::MiscTests::test_numpy_int_constant, test/dynamo/test_misc.py::MiscTests::test_numpy_iter, test/dynamo/test_misc.py::MiscTests::test_numpy_min, test/dynamo/test_misc.py::MiscTests::test_numpy_ndarray_graph_break, test/dynamo/test_misc.py::MiscTests::test_numpy_ndarray_graph_break_with_multiple_outputs, test/dynamo/test_misc.py::MiscTests::test_numpy_ndarray_works_with_builtin_function, test/dynamo/test_misc.py::MiscTests::test_numpy_no_raise, test/dynamo/test_misc.py::MiscTests::test_numpy_non_torch_dtype, test/dynamo/test_misc.py::MiscTests::test_numpy_random_config_to_numpy, test/dynamo/test_misc.py::MiscTests::test_numpy_readonly, test/dynamo/test_misc.py::MiscTests::test_numpy_recompilation_scalar, test/dynamo/test_misc.py::MiscTests::test_numpy_size_attr, test/dynamo/test_misc.py::MiscTests::test_numpy_subdtype, test/dynamo/test_misc.py::MiscTests::test_numpy_take_along_axis, test/dynamo/test_misc.py::MiscTests::test_numpy_tolist, test/dynamo/test_misc.py::MiscTests::test_numpy_torch_operators, test/dynamo/test_misc.py::MiscTests::test_numpy_ufunc_out, test/dynamo/test_misc.py::MiscTests::test_numpy_ufunc_out_graph_break, test/dynamo/test_misc.py::MiscTests::test_numpy_unique_f16, test/dynamo/test_misc.py::MiscTests::test_numpy_variable_isinstance, test/dynamo/test_misc.py::MiscTests::test_numpy_with_builtin_type, test/dynamo/test_misc.py::MiscTests::test_object_classmethod, test/dynamo/test_misc.py::MiscTests::test_object_setattr, test/dynamo/test_misc.py::MiscTests::test_object_staticmethod, test/dynamo/test_misc.py::MiscTests::test_onnx_shape_as_tensor, test/dynamo/test_misc.py::MiscTests::test_optimize_on_module, test/dynamo/test_misc.py::MiscTests::test_ordered_dict_alias_reconstruct, test/dynamo/test_misc.py::MiscTests::test_ordered_dict_move_to_end, test/dynamo/test_misc.py::MiscTests::test_os_environ_get, test/dynamo/test_misc.py::MiscTests::test_os_environ_set_graph_break, test/dynamo/test_misc.py::MiscTests::test_out_variant_custom_op, test/dynamo/test_misc.py::MiscTests::test_out_variants_with_resizing_on_graph_inputs, test/dynamo/test_misc.py::MiscTests::test_out_variants_with_resizing_on_graph_inputs_with_dynamic, test/dynamo/test_misc.py::MiscTests::test_out_variants_with_resizing_on_graph_inputs_with_dynamic1, test/dynamo/test_misc.py::MiscTests::test_outside_linear_module_free, test/dynamo/test_misc.py::MiscTests::test_overridden_getattribute, test/dynamo/test_misc.py::MiscTests::test_packaging_version_parse, test/dynamo/test_misc.py::MiscTests::test_pair, test/dynamo/test_misc.py::MiscTests::test_param_shape_binops, test/dynamo/test_misc.py::MiscTests::test_parameter_free, test/dynamo/test_misc.py::MiscTests::test_patched_builtin_functions, test/dynamo/test_misc.py::MiscTests::test_pep0479_convert_stopiteration, test/dynamo/test_misc.py::MiscTests::test_precompile_entries, test/dynamo/test_misc.py::MiscTests::test_precompile_entry_hit, test/dynamo/test_misc.py::MiscTests::test_precompile_entry_miss, test/dynamo/test_misc.py::MiscTests::test_precompile_fail_on_recompile, test/dynamo/test_misc.py::MiscTests::test_proxy_frozen_dataclass, test/dynamo/test_misc.py::MiscTests::test_pt2_compliant_ops_are_allowed, test/dynamo/test_misc.py::MiscTests::test_pt2_compliant_overload, test/dynamo/test_misc.py::MiscTests::test_pure_python_accumulate, test/dynamo/test_misc.py::MiscTests::test_py_guards_mark_dynamic, test/dynamo/test_misc.py::MiscTests::test_python_slice, test/dynamo/test_misc.py::MiscTests::test_raise_guard_full_constraint, test/dynamo/test_misc.py::MiscTests::test_raise_guard_indirect_full_constraint, test/dynamo/test_misc.py::MiscTests::test_raise_guard_partial_constraint_across_break, test/dynamo/test_misc.py::MiscTests::test_raise_guard_partial_constraint_no_graph_break, test/dynamo/test_misc.py::MiscTests::test_raise_on_backend_error, test/dynamo/test_misc.py::MiscTests::test_raises, test/dynamo/test_misc.py::MiscTests::test_raises_importerror1, test/dynamo/test_misc.py::MiscTests::test_raises_importerror2, test/dynamo/test_misc.py::MiscTests::test_range_input, test/dynamo/test_misc.py::MiscTests::test_range_iter_guards, test/dynamo/test_misc.py::MiscTests::test_range_iter_side_effects, test/dynamo/test_misc.py::MiscTests::test_range_with_shape, test/dynamo/test_misc.py::MiscTests::test_real_imag_tensor_attribute, test/dynamo/test_misc.py::MiscTests::test_recompile_message_on_parameter, test/dynamo/test_misc.py::MiscTests::test_recompile_on_disable_1, test/dynamo/test_misc.py::MiscTests::test_recompile_on_disable_2, test/dynamo/test_misc.py::MiscTests::test_recompile_on_global_state_change, test/dynamo/test_misc.py::MiscTests::test_reconstruct_frozen_dataclass, test/dynamo/test_misc.py::MiscTests::test_reconstruct_set_across_graph_break, test/dynamo/test_misc.py::MiscTests::test_recursion_depth_guards, test/dynamo/test_misc.py::MiscTests::test_recursive_inline_list_mutation, test/dynamo/test_misc.py::MiscTests::test_recursive_tensor_attribute, test/dynamo/test_misc.py::MiscTests::test_release_input_memory, test/dynamo/test_misc.py::MiscTests::test_release_module_memory, test/dynamo/test_misc.py::MiscTests::test_release_scope_memory, test/dynamo/test_misc.py::MiscTests::test_remove_set, test/dynamo/test_misc.py::MiscTests::test_repeat_interleave_graphbreaks, test/dynamo/test_misc.py::MiscTests::test_repro_graph_breaks_in__get_item_by_idx, test/dynamo/test_misc.py::MiscTests::test_restore_graphstate, test/dynamo/test_misc.py::MiscTests::test_return_dict_with_graph_break_and_update, test/dynamo/test_misc.py::MiscTests::test_return_nested_function, test/dynamo/test_misc.py::MiscTests::test_returning_func_with_captured_func_and_tensor, test/dynamo/test_misc.py::MiscTests::test_returning_nested_func_with_captured_tensor, test/dynamo/test_misc.py::MiscTests::test_running_func_with_captured_func_and_tensor, test/dynamo/test_misc.py::MiscTests::test_running_nested_func_with_captured_tensor, test/dynamo/test_misc.py::MiscTests::test_runtime_assert_replacement, test/dynamo/test_misc.py::MiscTests::test_sample_input, test/dynamo/test_misc.py::MiscTests::test_scalar_device_movement, test/dynamo/test_misc.py::MiscTests::test_scalar_tensor_is_equivalent_to_int_list_argument, test/dynamo/test_misc.py::MiscTests::test_scalar_tensor_is_equivalent_to_symint_argument, test/dynamo/test_misc.py::MiscTests::test_scalar_tensor_is_equivalent_to_symint_list_argument, test/dynamo/test_misc.py::MiscTests::test_sequential_module_free, test/dynamo/test_misc.py::MiscTests::test_set_aliasing_recompiles, test/dynamo/test_misc.py::MiscTests::test_set_custom_tensor_attribute, test/dynamo/test_misc.py::MiscTests::test_set_descriptor, test/dynamo/test_misc.py::MiscTests::test_set_discard, test/dynamo/test_misc.py::MiscTests::test_set_update, test/dynamo/test_misc.py::MiscTests::test_setattr_mutation1, test/dynamo/test_misc.py::MiscTests::test_setattr_mutation2, test/dynamo/test_misc.py::MiscTests::test_setattr_mutation3, test/dynamo/test_misc.py::MiscTests::test_shape_and_tuple_equality, test/dynamo/test_misc.py::MiscTests::test_shape_env_equal_constructor, test/dynamo/test_misc.py::MiscTests::test_shape_env_equal_create_symbolic_sizes_strides_storage_offset, test/dynamo/test_misc.py::MiscTests::test_shape_env_equal_empty, test/dynamo/test_misc.py::MiscTests::test_shape_env_equal_evaluate_expr_divisible, test/dynamo/test_misc.py::MiscTests::test_shape_env_equal_evaluate_expr_refinement, test/dynamo/test_misc.py::MiscTests::test_shape_env_equal_evaluate_expr_replacement, test/dynamo/test_misc.py::MiscTests::test_shape_env_equal_runtime_assert, test/dynamo/test_misc.py::MiscTests::test_shape_env_equal_unbacked, test/dynamo/test_misc.py::MiscTests::test_shape_env_no_recording, test/dynamo/test_misc.py::MiscTests::test_shape_env_recorded_function_fallback, test/dynamo/test_misc.py::MiscTests::test_shape_int_comparisons, test/dynamo/test_misc.py::MiscTests::test_shape_int_inplace_binops, test/dynamo/test_misc.py::MiscTests::test_shape_type, test/dynamo/test_misc.py::MiscTests::test_shape_unpack, test/dynamo/test_misc.py::MiscTests::test_side_effects_codegen_update_mutated, test/dynamo/test_misc.py::MiscTests::test_simple_set_usage, test/dynamo/test_misc.py::MiscTests::test_size_dim, test/dynamo/test_misc.py::MiscTests::test_size_input, test/dynamo/test_misc.py::MiscTests::test_slice_input, test/dynamo/test_misc.py::MiscTests::test_source_non_input_grad_access, test/dynamo/test_misc.py::MiscTests::test_sourceless_namedtuple, test/dynamo/test_misc.py::MiscTests::test_storage_return, test/dynamo/test_misc.py::MiscTests::test_str_format_assert1, test/dynamo/test_misc.py::MiscTests::test_str_format_assert2, test/dynamo/test_misc.py::MiscTests::test_str_format_return1, test/dynamo/test_misc.py::MiscTests::test_str_format_return2, test/dynamo/test_misc.py::MiscTests::test_stride_dim, test/dynamo/test_misc.py::MiscTests::test_structseq1, test/dynamo/test_misc.py::MiscTests::test_structseq2, test/dynamo/test_misc.py::MiscTests::test_super_after_graph_break, test/dynamo/test_misc.py::MiscTests::test_super_calling_with_metaclass, test/dynamo/test_misc.py::MiscTests::test_sym_and_terms, test/dynamo/test_misc.py::MiscTests::test_sym_constrain_range_on_replaced_unbacked_symbol, test/dynamo/test_misc.py::MiscTests::test_sym_max_unbacked_sizelike_simplification, test/dynamo/test_misc.py::MiscTests::test_symint_as_device_kwarg_multi_gpu, test/dynamo/test_misc.py::MiscTests::test_symint_as_device_kwarg_non_strict_export, test/dynamo/test_misc.py::MiscTests::test_symint_copy_into_unbacked_slice, test/dynamo/test_misc.py::MiscTests::test_symint_fold_nontrivial_product_modulo, test/dynamo/test_misc.py::MiscTests::test_sys_modules, test/dynamo/test_misc.py::MiscTests::test_tagging_tensors_mix_used_unused_structure, test/dynamo/test_misc.py::MiscTests::test_tagging_tensors_simple, test/dynamo/test_misc.py::MiscTests::test_tensor_build_list_unpack, test/dynamo/test_misc.py::MiscTests::test_tensor_ctor_list_of_tensor, test/dynamo/test_misc.py::MiscTests::test_tensor_data, test/dynamo/test_misc.py::MiscTests::test_tensor_dict1, test/dynamo/test_misc.py::MiscTests::test_tensor_dict2, test/dynamo/test_misc.py::MiscTests::test_tensor_dict3, test/dynamo/test_misc.py::MiscTests::test_tensor_dot_grad_no_graph_break, test/dynamo/test_misc.py::MiscTests::test_tensor_dynamic_method, test/dynamo/test_misc.py::MiscTests::test_tensor_hasattr, test/dynamo/test_misc.py::MiscTests::test_tensor_interacts_with_numpy_ndarray, test/dynamo/test_misc.py::MiscTests::test_tensor_is_contiguous, test/dynamo/test_misc.py::MiscTests::test_tensor_item_capture, test/dynamo/test_misc.py::MiscTests::test_tensor_item_no_capture, test/dynamo/test_misc.py::MiscTests::test_tensor_iter, test/dynamo/test_misc.py::MiscTests::test_tensor_layout, test/dynamo/test_misc.py::MiscTests::test_tensor_setattr_getset_descriptor, test/dynamo/test_misc.py::MiscTests::test_tensor_types, test/dynamo/test_misc.py::MiscTests::test_thread_local_setattr, test/dynamo/test_misc.py::MiscTests::test_tolist_0d, test/dynamo/test_misc.py::MiscTests::test_tolist_1d, test/dynamo/test_misc.py::MiscTests::test_tolist_float, test/dynamo/test_misc.py::MiscTests::test_tolist_kd, test/dynamo/test_misc.py::MiscTests::test_tolist_kd_dynamic, test/dynamo/test_misc.py::MiscTests::test_tolist_scalar, test/dynamo/test_misc.py::MiscTests::test_top_package_import, test/dynamo/test_misc.py::MiscTests::test_torch_check, test/dynamo/test_misc.py::MiscTests::test_torch_check_is_size, test/dynamo/test_misc.py::MiscTests::test_torch_check_symbolic_shape_rel, test/dynamo/test_misc.py::MiscTests::test_torch_compile_ctx_on_forward_and_training_step, test/dynamo/test_misc.py::MiscTests::test_torch_distributions_lazy_property, test/dynamo/test_misc.py::MiscTests::test_torch_dtype_python_type, test/dynamo/test_misc.py::MiscTests::test_torch_dynamo_codegen_pow, test/dynamo/test_misc.py::MiscTests::test_torch_generator_set_state, test/dynamo/test_misc.py::MiscTests::test_torch_guards_stack_frame_register_inlining, test/dynamo/test_misc.py::MiscTests::test_torch_guards_stack_frame_register_inlining_deep, test/dynamo/test_misc.py::MiscTests::test_torch_nn_parameter_isinstance, test/dynamo/test_misc.py::MiscTests::test_torch_objects_as_keys, test/dynamo/test_misc.py::MiscTests::test_torch_package_working_with_trace, test/dynamo/test_misc.py::MiscTests::test_torch_seed, test/dynamo/test_misc.py::MiscTests::test_torch_size, test/dynamo/test_misc.py::MiscTests::test_torch_size_numel, test/dynamo/test_misc.py::MiscTests::test_torch_size_numel_dynamic, test/dynamo/test_misc.py::MiscTests::test_torch_variable_hasattr, test/dynamo/test_misc.py::MiscTests::test_trace_ndarray_frame, test/dynamo/test_misc.py::MiscTests::test_trace_ndarray_frame_2, test/dynamo/test_misc.py::MiscTests::test_tuple_class, test/dynamo/test_misc.py::MiscTests::test_tuple_from_tuple_iter, test/dynamo/test_misc.py::MiscTests::test_tuple_hasattr, test/dynamo/test_misc.py::MiscTests::test_tuple_iadd_with_shape, test/dynamo/test_misc.py::MiscTests::test_tuple_mul, test/dynamo/test_misc.py::MiscTests::test_tuple_mul_with_shape, test/dynamo/test_misc.py::MiscTests::test_type_copy, test/dynamo/test_misc.py::MiscTests::test_typing_dict, test/dynamo/test_misc.py::MiscTests::test_typing_typevar, test/dynamo/test_misc.py::MiscTests::test_typing_union_and_optional, test/dynamo/test_misc.py::MiscTests::test_typing_variable_isinstance, test/dynamo/test_misc.py::MiscTests::test_unbacked_2d_expand, test/dynamo/test_misc.py::MiscTests::test_unbacked_empty_tensor, test/dynamo/test_misc.py::MiscTests::test_unbacked_repeat_cat, test/dynamo/test_misc.py::MiscTests::test_unbacked_sources_scalar, test/dynamo/test_misc.py::MiscTests::test_unbacked_sources_tensor, test/dynamo/test_misc.py::MiscTests::test_unbacked_strict_mode, test/dynamo/test_misc.py::MiscTests::test_unbacked_symint, test/dynamo/test_misc.py::MiscTests::test_unhandled_exception_in_dynamo, test/dynamo/test_misc.py::MiscTests::test_unhandled_exception_in_dynamo2, test/dynamo/test_misc.py::MiscTests::test_unique_consecutive, test/dynamo/test_misc.py::MiscTests::test_unpack4, test/dynamo/test_misc.py::MiscTests::test_unpack5, test/dynamo/test_misc.py::MiscTests::test_unpack_tensor_shape_mismatch, test/dynamo/test_misc.py::MiscTests::test_update_locals_and_stack_uses_shared_cache, test/dynamo/test_misc.py::MiscTests::test_user_code_statically_known, test/dynamo/test_misc.py::MiscTests::test_user_defined_binop, test/dynamo/test_misc.py::MiscTests::test_user_defined_class_name, test/dynamo/test_misc.py::MiscTests::test_user_defined_class_python_type, test/dynamo/test_misc.py::MiscTests::test_user_defined_iter, test/dynamo/test_misc.py::MiscTests::test_user_defined_object_class_interaction, test/dynamo/test_misc.py::MiscTests::test_user_defined_setattr1, test/dynamo/test_misc.py::MiscTests::test_user_defined_setattr2, test/dynamo/test_misc.py::MiscTests::test_user_function_variable_supports_enum_argument, test/dynamo/test_misc.py::MiscTests::test_user_function_variable_supports_function_argument, test/dynamo/test_misc.py::MiscTests::test_user_function_variable_supports_type_abcmeta_argument, test/dynamo/test_misc.py::MiscTests::test_user_getattr1, test/dynamo/test_misc.py::MiscTests::test_user_getattr2, test/dynamo/test_misc.py::MiscTests::test_user_getattribute, test/dynamo/test_misc.py::MiscTests::test_user_property, test/dynamo/test_misc.py::MiscTests::test_usr_cls_classmethod, test/dynamo/test_misc.py::MiscTests::test_usr_cls_staticmethod, test/dynamo/test_misc.py::MiscTests::test_validate_outputs_unbacked, test/dynamo/test_misc.py::MiscTests::test_validate_outputs_unbacked_by_custom_op, test/dynamo/test_misc.py::MiscTests::test_variable_access_in_exception, test/dynamo/test_misc.py::MiscTests::test_variable_tracker_recursively_contains, test/dynamo/test_misc.py::MiscTests::test_version_ci, test/dynamo/test_misc.py::MiscTests::test_with_builtin_type, test/dynamo/test_misc.py::MiscTests::test_write_to_cells_with_name_shadowing, test/dynamo/test_misc.py::MiscTests::test_write_to_closures_in_inlining, test/dynamo/test_misc.py::MiscTests::test_writes_to_cells_across_frames1, test/dynamo/test_misc.py::MiscTests::test_writes_to_cells_across_frames2, test/dynamo/test_misc.py::MiscTests::test_yield_from, test/dynamo/test_misc.py::MiscTests::test_yield_from_in_a_loop, test/dynamo/test_misc.py::MiscTests::test_yield_from_user_stop_iteration, test/dynamo/test_misc.py::MiscTests::test_yield_gen_and_from, test/dynamo/test_misc.py::MiscTests::test_yield_send_to_subgenerator_graph_break, test/dynamo/test_misc.py::MiscTestsPyTree::test_pytree_tree_flatten_unflatten_cxx, test/dynamo/test_misc.py::MiscTestsPyTree::test_pytree_tree_flatten_unflatten_python, test/dynamo/test_misc.py::MiscTestsPyTree::test_pytree_tree_leaves_cxx, test/dynamo/test_misc.py::MiscTestsPyTree::test_pytree_tree_leaves_python, test/dynamo/test_misc.py::MiscTestsPyTree::test_pytree_tree_map_cxx, test/dynamo/test_misc.py::MiscTestsPyTree::test_pytree_tree_map_only_cxx, test/dynamo/test_misc.py::MiscTestsPyTree::test_pytree_tree_map_only_python, test/dynamo/test_misc.py::MiscTestsPyTree::test_pytree_tree_map_python, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_nested_dicts_cxx, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_nested_dicts_python, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_nested_mixed_all_cxx, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_nested_mixed_all_python, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_nested_pytree_cxx, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_nested_pytree_python, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_nested_tensor_subclass_cxx, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_nested_tensor_subclass_python, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_nested_tuples_cxx, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_nested_tuples_python, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_pytree_cxx, test/dynamo/test_misc.py::MiscTestsPyTree::test_tracing_pytree_python, test/dynamo/test_misc.py::TestTracer::test_jit_save, test/dynamo/test_misc.py::TestCustomFunction::test_autograd_function_with_matmul_folding_at_output, test/dynamo/test_misc.py::TestCustomFunction::test_retain_grad, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_cuda_set_device_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_dynamic_float_scalar_tensor_coersion_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_get_device_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_interpolate_propagate_real_tensors_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_legacy_cuda_tensor_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_parsing_sdpa_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_rand_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_randint_no_graphbreak_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_scalar_isin_decomposition_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_symint_as_device_kwarg_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_torch_cudnn_is_acceptable_bad_inputs_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_torch_cudnn_is_acceptable_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_torch_device_is_available_cuda, test/dynamo/test_misc.py::MiscTestsDeviceCUDA::test_torch_device_python_type_cuda 2025-09-07T07:32:52.3638391Z 2025-09-07T07:32:52.3638639Z Running inductor/test_torchinductor_dynamic_shapes 1/2 ... [2025-09-07 07:32:52.338560] 2025-09-07T07:32:52.3639077Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:52.3640168Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_dynamic_shapes.py', '-m', 'not serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:52.338917] 2025-09-07T07:32:54.4432253Z 2025-09-07T07:32:54.4433765Z inductor/test_loop_ordering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_loop_ordering_1.1_ba3d8468e4e1c782_.log 2025-09-07T07:32:54.4454126Z Running 47 items in this shard: test/inductor/test_loop_ordering.py::ImplDetailTest::test_merge_loops_invalidate_pw_dep_cache, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_and_merge_loops, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_modular_indexing, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_twice, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_apbt_realize, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_broadcast_shapes, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_reduction_order, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_for_reordering_reindex, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_cast_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_pattern_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_reduction_with_tiled_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_triton_template, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_keep_fake_dep, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_softmax, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_sum_fuse_with_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_sum_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_view, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_coalescing, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps0, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps1, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps2, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps3, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_no_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads_split, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_zero, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_False, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_True, test/inductor/test_loop_ordering.py::TestTiling::test_3d_pointwise, test/inductor/test_loop_ordering.py::TestTiling::test_cat, test/inductor/test_loop_ordering.py::TestTiling::test_mutation_deps, test/inductor/test_loop_ordering.py::TestTiling::test_penalized_small_dim, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_tiled_reduction 2025-09-07T07:32:54.4466771Z 2025-09-07T07:32:54.4466957Z Running inductor/test_cutlass_evt 1/1 ... [2025-09-07 07:32:54.443276] 2025-09-07T07:32:54.4467332Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:32:54.4468255Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutlass_evt.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:32:54.443639] 2025-09-07T07:33:01.3676902Z 2025-09-07T07:33:01.3678243Z inductor/test_cutlass_evt 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutlass_evt_1.1_fc538fb656d9cf1b_.log 2025-09-07T07:33:01.3683875Z Running 8 items in this shard: test/inductor/test_cutlass_evt.py::TestCutlassEVT::test_evt_argument_codegen, test/inductor/test_cutlass_evt.py::TestCutlassEVT::test_evt_argument_codegen_return_accumulator, test/inductor/test_cutlass_evt.py::TestCutlassEVT::test_evt_codegen, test/inductor/test_cutlass_evt.py::TestCutlassEVT::test_example_tensor_creation, test/inductor/test_cutlass_evt.py::TestCutlassEVT::test_py_codegen, test/inductor/test_cutlass_evt.py::TestCutlassEVT::test_py_codegen_accumulator_return, test/inductor/test_cutlass_evt.py::TestCutlassEVT::test_py_codegen_broadcasting, test/inductor/test_cutlass_evt.py::TestCutlassEVT::test_py_codegen_disjoint_read_indexing 2025-09-07T07:33:01.3687270Z 2025-09-07T07:33:01.3687457Z Running dynamo/test_sets 1/1 ... [2025-09-07 07:33:01.367627] 2025-09-07T07:33:01.3687847Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:01.3688868Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_sets.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:01.367964] 2025-09-07T07:33:05.4381407Z 2025-09-07T07:33:05.4382783Z dynamo/test_sets 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_sets_1.1_0eb2ce1f53b39bf2_.log 2025-09-07T07:33:05.4415855Z Running 124 items in this shard: test/dynamo/test_sets.py::CustomSetTests::test_custom_add, test/dynamo/test_sets.py::CustomSetTests::test_custom_contains, test/dynamo/test_sets.py::MiscTests::test_isdisjoint_with_generator, test/dynamo/test_sets.py::TestSetGuards::test_in_guard, test/dynamo/test_sets.py::TestSetGuards::test_set_guard_on_keys_change, test/dynamo/test_sets.py::TestSetGuards::test_set_multiple_types, test/dynamo/test_sets.py::TestSetGuards::test_set_recompile_on_key_change, test/dynamo/test_sets.py::TestSetGuards::test_set_recompile_on_key_pop, test/dynamo/test_sets.py::TestSetGuards::test_set_with_function, test/dynamo/test_sets.py::TestSetGuards::test_set_with_tensors, test/dynamo/test_sets.py::FrozensetTests::test_binop_and, test/dynamo/test_sets.py::FrozensetTests::test_binop_or, test/dynamo/test_sets.py::FrozensetTests::test_binop_sub, test/dynamo/test_sets.py::FrozensetTests::test_binop_xor, test/dynamo/test_sets.py::FrozensetTests::test_cmp_eq, test/dynamo/test_sets.py::FrozensetTests::test_cmp_greater_than, test/dynamo/test_sets.py::FrozensetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::FrozensetTests::test_cmp_less_than, test/dynamo/test_sets.py::FrozensetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::FrozensetTests::test_cmp_ne, test/dynamo/test_sets.py::FrozensetTests::test_constructor_iterable, test/dynamo/test_sets.py::FrozensetTests::test_contains, test/dynamo/test_sets.py::FrozensetTests::test_copy, test/dynamo/test_sets.py::FrozensetTests::test_difference, test/dynamo/test_sets.py::FrozensetTests::test_equality, test/dynamo/test_sets.py::FrozensetTests::test_in_frozenset, test/dynamo/test_sets.py::FrozensetTests::test_intersection, test/dynamo/test_sets.py::FrozensetTests::test_isdisjoint, test/dynamo/test_sets.py::FrozensetTests::test_issubset, test/dynamo/test_sets.py::FrozensetTests::test_issuperset, test/dynamo/test_sets.py::FrozensetTests::test_symmetric_difference, test/dynamo/test_sets.py::FrozensetTests::test_to_frozenset, test/dynamo/test_sets.py::FrozensetTests::test_to_set, test/dynamo/test_sets.py::FrozensetTests::test_union, test/dynamo/test_sets.py::SetTests::test_add, test/dynamo/test_sets.py::SetTests::test_binop_and, test/dynamo/test_sets.py::SetTests::test_binop_or, test/dynamo/test_sets.py::SetTests::test_binop_sub, test/dynamo/test_sets.py::SetTests::test_binop_xor, test/dynamo/test_sets.py::SetTests::test_clear, test/dynamo/test_sets.py::SetTests::test_cmp_eq, test/dynamo/test_sets.py::SetTests::test_cmp_greater_than, test/dynamo/test_sets.py::SetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::SetTests::test_cmp_less_than, test/dynamo/test_sets.py::SetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::SetTests::test_cmp_ne, test/dynamo/test_sets.py::SetTests::test_constructor_iterable, test/dynamo/test_sets.py::SetTests::test_contains, test/dynamo/test_sets.py::SetTests::test_copy, test/dynamo/test_sets.py::SetTests::test_difference, test/dynamo/test_sets.py::SetTests::test_difference_update, test/dynamo/test_sets.py::SetTests::test_discard, test/dynamo/test_sets.py::SetTests::test_equality, test/dynamo/test_sets.py::SetTests::test_in_frozenset, test/dynamo/test_sets.py::SetTests::test_intersection, test/dynamo/test_sets.py::SetTests::test_intersection_update, test/dynamo/test_sets.py::SetTests::test_isdisjoint, test/dynamo/test_sets.py::SetTests::test_issubset, test/dynamo/test_sets.py::SetTests::test_issuperset, test/dynamo/test_sets.py::SetTests::test_pop, test/dynamo/test_sets.py::SetTests::test_remove, test/dynamo/test_sets.py::SetTests::test_symmetric_difference, test/dynamo/test_sets.py::SetTests::test_symmetric_difference_update, test/dynamo/test_sets.py::SetTests::test_to_frozenset, test/dynamo/test_sets.py::SetTests::test_to_set, test/dynamo/test_sets.py::SetTests::test_union, test/dynamo/test_sets.py::SetTests::test_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_add, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_and, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_or, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_sub, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_xor, test/dynamo/test_sets.py::UserDefinedSetTests::test_clear, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_eq, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_greater_than, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_less_than, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_ne, test/dynamo/test_sets.py::UserDefinedSetTests::test_constructor_iterable, test/dynamo/test_sets.py::UserDefinedSetTests::test_contains, test/dynamo/test_sets.py::UserDefinedSetTests::test_copy, test/dynamo/test_sets.py::UserDefinedSetTests::test_difference, test/dynamo/test_sets.py::UserDefinedSetTests::test_difference_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_discard, test/dynamo/test_sets.py::UserDefinedSetTests::test_equality, test/dynamo/test_sets.py::UserDefinedSetTests::test_in_frozenset, test/dynamo/test_sets.py::UserDefinedSetTests::test_intersection, test/dynamo/test_sets.py::UserDefinedSetTests::test_intersection_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_isdisjoint, test/dynamo/test_sets.py::UserDefinedSetTests::test_issubset, test/dynamo/test_sets.py::UserDefinedSetTests::test_issuperset, test/dynamo/test_sets.py::UserDefinedSetTests::test_pop, test/dynamo/test_sets.py::UserDefinedSetTests::test_remove, test/dynamo/test_sets.py::UserDefinedSetTests::test_symmetric_difference, test/dynamo/test_sets.py::UserDefinedSetTests::test_symmetric_difference_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_to_frozenset, test/dynamo/test_sets.py::UserDefinedSetTests::test_to_set, test/dynamo/test_sets.py::UserDefinedSetTests::test_union, test/dynamo/test_sets.py::UserDefinedSetTests::test_update, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_and, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_or, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_sub, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_xor, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_eq, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_greater_than, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_less_than, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_ne, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_constructor_iterable, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_contains, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_copy, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_difference, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_equality, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_in_frozenset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_intersection, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_isdisjoint, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_issubset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_issuperset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_symmetric_difference, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_to_frozenset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_to_set, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_union 2025-09-07T07:33:05.4440670Z 2025-09-07T07:33:05.4440843Z Running test_numpy_interop 1/1 ... [2025-09-07 07:33:05.438255] 2025-09-07T07:33:05.4441209Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:05.4442120Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_numpy_interop.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:05.438607] 2025-09-07T07:33:09.5090245Z 2025-09-07T07:33:09.5091307Z test_numpy_interop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_numpy_interop_1.1_ae97a61901d05f93_.log 2025-09-07T07:33:09.5111255Z Running 44 items in this shard: test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_bool, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_complex128, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_complex64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_uint8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ctor_with_invalid_numpy_array_sequence_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ctor_with_numpy_scalar_ctor_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_empty_tensors_interop_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_list_of_ndarray_warning_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_no_leak_on_invalid_dtype_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_zero_element_type_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_has_storage_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_multiplication_numpy_scalar_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ndarray_astype_object_graph_break_2_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ndarray_astype_object_graph_break_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_array_interface_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_index_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_index_multi_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_non_writeable_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_bfloat16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_bool, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_complex128, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_complex64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_uint8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_unresizable_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_parse_numpy_int_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_parse_numpy_int_overflow_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_bool_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_force_argument_cuda 2025-09-07T07:33:09.5123289Z 2025-09-07T07:33:09.5123561Z Running inductor/test_cudagraph_trees_expandable_segments 1/1 ... [2025-09-07 07:33:09.509051] 2025-09-07T07:33:09.5124009Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:09.5124999Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cudagraph_trees_expandable_segments.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:09.509405] 2025-09-07T07:33:16.6837717Z 2025-09-07T07:33:16.6839386Z inductor/test_cudagraph_trees_expandable_segments 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cudagraph_trees_expandable_segments_1.1_0695c7e75d1a33fb_.log 2025-09-07T07:33:16.6901048Z Running 145 items in this shard: test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_accumulate_grad, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_accumulate_multiple_recordings, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_alias_of_parameter, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_aliased_output_checkpoint, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_aliased_static_parameter, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_aliased_storage_single_weakref, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_aliasing_static_ref, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_amp_cache_disabled, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_backward_gets_cached_cudagraphs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_cache_hit_forward_miss_backward, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_cached_boxed_forward_device_index, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_cached_forward_backward, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_checkpoint_shared_output_storage_deallocation, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_checkpointing_resets_persistent_refs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_cleanup, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_compiled_autograd_static_input_params, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_constant_output, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_conv_benchmark, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_cpp_wrapper, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_cudagraph_capture_sizes, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_cudagraph_capture_sizes1, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_cudagraph_capture_sizes2, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_cudagraph_or_error, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_dynamic_backward, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_dynamic_warmup, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_empty_cpu_tensor, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_empty_storage, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_end_recording_early, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_error_on_dealloc_use, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_error_on_dealloc_use2, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_execution_into_recording, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_expanded_inputs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_due_to_cudagraph_managed_tensor, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_warn_only_once, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_forward_backward, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_cudagraphs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_inductor, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_forward_generation, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_forward_with_skipped_cudagraphed_backward, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_frozen_fn, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_function_compiled_multiple_times, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_buffer_reuse, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_condition_op, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_cpu_only, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_cpu_op_and_dynamic_shapes, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar1, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar2, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar3, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar4, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_device_put, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_multiple, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_mutation, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_cpu_tensor_symints, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_custom_op, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_custom_op_dynamoc_shapes, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_custom_op_mutation, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_custom_op_no_split, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_dynamic_scalar_inputs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_dynamic_shapes, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_foreach_op, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_forward_backward, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_forward_backward_not_called, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_forward_with_skipped_cudagraphed_backward, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_fused_scheduler_node, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_gc, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_item, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_log_message, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_multiple_devices_msg, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_reduce_overhead_mode_effectiveness, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu_interleave, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency1, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_simple, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_symint, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_symint_cat_backward, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_symint_from_mutation_index, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_symint_from_nested_indirect_indexing, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint_multi_output_layout, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_item, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_backend, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_graph_breaks, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_index_put, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_live_outputs_multiple_graphs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_manager_per_device, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mark_step, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_meta_tensor, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_multi_dispatch_child_node, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_multi_dispatch_custom_module, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_multi_dispatch_custom_module_buffer, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_multi_dispatch_parent_node, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module_buffers, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_param_inputs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_multinomial, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_cudagraphs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_inductor, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_multiple_insert_removal_caching, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_cudagraphs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_inductor, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_cudagraphs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_inductor, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_cudagraphs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_inductor, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_cudagraphs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_inductor, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mutation_on_inp_backend_cudagraphs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mutation_on_inp_backend_inductor, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_mutation_reinplaced, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_no_rerecord_with_mark_static_address, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_not_fallback_to_eager_if_have_not_recompiling_too_many_times, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_output_alias, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_peristed_output_livenes, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_remove_hooks_on_cached_tensors, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_rerecord_if_static_input_address_changed, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_rng_non_trees, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_rng_trees, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_run_simple, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_separate_recordings, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_side_stream_memory_allocation, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_single_stream_use, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_skip_cpp_wrapper, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_skip_cudagraph_unsafe_ops, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached1, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached2, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_skip_symbolic, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_sparsity, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_static_inputs_address_mutation_log, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_storage_access_error, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_tensor_constant_mutation, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_tensor_dies_between_checkpoint, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_tensor_no_longer_in_pool, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_unaligned_static_input_no_cudagraphs, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_unaligned_static_input_non_trees, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_unaligned_static_input_trees, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_unaligned_static_parameter, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_unstable_ptr, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_warmup_stream_sync, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_warn_on_pending_backward, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_warn_once_if_dynamic_shape_limit_reached, test/inductor/test_cudagraph_trees_expandable_segments.py::CudaGraphTreeTests::test_workspace_allocation_error 2025-09-07T07:33:16.6956525Z 2025-09-07T07:33:16.6956760Z Running dynamo/test_backward_higher_order_ops 1/1 ... [2025-09-07 07:33:16.684021] 2025-09-07T07:33:16.6957188Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:16.6958254Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_backward_higher_order_ops.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:16.684380] 2025-09-07T07:33:20.6044540Z 2025-09-07T07:33:20.6045501Z dynamo/test_backward_higher_order_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_backward_higher_order_ops_1.1_7714b312cd99755b_.log 2025-09-07T07:33:20.6049902Z Running 7 items in this shard: test/dynamo/test_backward_higher_order_ops.py::BackwardHigherOrderOpTests::test_invoke_in_eager, test/dynamo/test_backward_higher_order_ops.py::BackwardHigherOrderOpTests::test_invoke_in_pt2, test/dynamo/test_backward_higher_order_ops.py::BackwardHigherOrderOpTests::test_invoke_in_pt2_compiled_autograd, test/dynamo/test_backward_higher_order_ops.py::BackwardHigherOrderOpTests::test_invoke_in_pt2_compiled_autograd_graph_breaks, test/dynamo/test_backward_higher_order_ops.py::BackwardHigherOrderOpTests::test_invoke_in_pt2_compiled_autograd_side_effect, test/dynamo/test_backward_higher_order_ops.py::BackwardHigherOrderOpTests::test_invoke_make_bw, test/dynamo/test_backward_higher_order_ops.py::BackwardHigherOrderOpTests::test_invoke_make_fx_forward_contrived 2025-09-07T07:33:20.6053346Z 2025-09-07T07:33:20.6053684Z Running inductor/test_torchinductor_codegen_config_overrides 1/1 ... [2025-09-07 07:33:20.604412] 2025-09-07T07:33:20.6054356Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:20.6055526Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_codegen_config_overrides.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:20.604786] 2025-09-07T07:33:27.5789795Z 2025-09-07T07:33:27.5791296Z inductor/test_torchinductor_codegen_config_overrides 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_codegen_config_overrides_1.1_3a14558d971425d6_.log 2025-09-07T07:33:27.5794982Z Running 3 items in this shard: test/inductor/test_torchinductor_codegen_config_overrides.py::CodegenInductorTest::test_force_pointwise_cat_force_pointwise_cat_False, test/inductor/test_torchinductor_codegen_config_overrides.py::CodegenInductorTest::test_force_pointwise_cat_force_pointwise_cat_True, test/inductor/test_torchinductor_codegen_config_overrides.py::CodegenInductorTest::test_kernel_fusion_thresholds 2025-09-07T07:33:27.5797392Z 2025-09-07T07:33:27.5797679Z Running test_nestedtensor 1/1 ... [2025-09-07 07:33:27.578984] 2025-09-07T07:33:27.5798284Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:27.5799815Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:27.579346] 2025-09-07T07:33:35.9564539Z 2025-09-07T07:33:35.9565583Z test_nestedtensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_1.1_d66fc06a374d0e0e_.log 2025-09-07T07:33:36.0188774Z Running 1590 items in this shard: test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_cat, test/test_nestedtensor.py::TestNestedTensor::test_copy_, test/test_nestedtensor.py::TestNestedTensor::test_default_nested_tensor, test/test_nestedtensor.py::TestNestedTensor::test_dim, test/test_nestedtensor.py::TestNestedTensor::test_fill_, test/test_nestedtensor.py::TestNestedTensor::test_is_contiguous, test/test_nestedtensor.py::TestNestedTensor::test_like_functions_ones_like, test/test_nestedtensor.py::TestNestedTensor::test_like_functions_randn_like, test/test_nestedtensor.py::TestNestedTensor::test_like_functions_zeros_like, test/test_nestedtensor.py::TestNestedTensor::test_nested_namespace, test/test_nestedtensor.py::TestNestedTensor::test_nested_tensor, test/test_nestedtensor.py::TestNestedTensor::test_nested_tensor_matching_dim, test/test_nestedtensor.py::TestNestedTensor::test_nested_view_from_buffer_overflow_errors, test/test_nestedtensor.py::TestNestedTensor::test_numel, test/test_nestedtensor.py::TestNestedTensor::test_repr_string, test/test_nestedtensor.py::TestNestedTensor::test_size, test/test_nestedtensor.py::TestNestedTensor::test_size_dim, test/test_nestedtensor.py::TestNestedTensor::test_stride, test/test_nestedtensor.py::TestNestedTensor::test_to, test/test_nestedtensor.py::TestNestedTensor::test_to_padded_tensor_on_empty_tensor, test/test_nestedtensor.py::TestNestedTensor::test_unbind_0, test/test_nestedtensor.py::TestNestedTensor::test_unbind_1, test/test_nestedtensor.py::TestNestedTensor::test_unbind_3, test/test_nestedtensor.py::TestNestedTensor::test_unbind_4, test/test_nestedtensor.py::TestNestedTensor::test_unbind_dim, test/test_nestedtensor.py::TestNestedTensor::test_zero_, test/test_nestedtensor.py::TestNestedInt::test_comparisons, test/test_nestedtensor.py::TestNestedInt::test_with_factor, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_binary_ops_with_scalar_eq_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_binary_ops_with_scalar_ge_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cpu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cpu_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_clone_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_contiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_contiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_detach_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_detach_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_detach_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_device_checks_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_jagged_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_jagged_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_strided_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_strided_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_embedding_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_embedding_strided_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_breaking_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_breaking_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_nt_with_broadcasted_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_nt_with_broadcasted_t_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_with_bmm_path_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_with_bmm_path_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_masked_select_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_in_place_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_in_place_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_128_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_128_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_384_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_384_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_8_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_8_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_div_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_div_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_noncontiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_in_place_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_in_place_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_split_with_sizes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_split_with_sizes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sum_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_scaled_dot_product_attention_input_dim_3_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_scaled_dot_product_attention_input_dim_4_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_noncontiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_output_size_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_output_size_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_simple_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_simple_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_then_from_padded_tensor_no_transform0213_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_abs__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_abs_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_cos_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_gelu__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_gelu_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isinf_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isnan_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isneginf_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isposinf_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_logical_not_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_neg_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_relu__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_relu_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sgn_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_silu__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_silu_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sin_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sqrt_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_tanh__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_tanh_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float64, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_abs_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_accumulate_grad_different_strides_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_as_nested_tensor_propagates_gradients_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_add_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_for_add_op_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_for_sub_op_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_sub_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_dropout_backward_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_dropout_backward_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_gelu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_indexing_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_128_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_2_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_32_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_4_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_edge_case_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_1023_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_1024_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_128_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_256_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_2_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_32_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_4_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_512_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_513_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_masked_fill_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_bmm_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_bmm_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_list_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_mask_and_to_padded_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_padded_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_padded_fused_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_generates_leaf_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_linear_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_linear_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_linear_plus_transpose_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_matmul_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_matmul_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_reshape_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_reshape_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_softmax_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_squeeze_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_squeeze_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_to_padded_tensor_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_transpose_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_transpose_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_unsqueeze_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_unsqueeze_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_relu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_selu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_set_requires_grad_from_list_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_set_requires_grad_from_mask_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_split_with_sizes_flow_through_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_to_buffer_series_ops_grad_with_broadcast_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_unbind_flow_through_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_values_grad_with_broadcast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_apply__cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_autograd_function_with_None_grad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_broadcasting_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_transposed_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_with_nested_int_second_arg_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_broadcast_shapes_on_in_graph_constructed_njt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_chunk_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_padded_dense_conversion_preserves_metadata_cache_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_preserves_metadata_cache_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_with_dynamic_max_seq_len_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_with_dynamic_min_seq_len_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_with_propagated_dynamic_max_seq_len_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_composite_op_in_inference_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_composite_op_with_custom_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_construction_from_list_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_copy__cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_device_dtype_transfer_updates_offsets_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_device_dtype_transfer_updates_offsets_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dropout_inference_mode_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dummy_mha_with_nt_use_legacy_api_False_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dummy_mha_with_nt_use_legacy_api_True_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_flatten_decomp_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_flex_attention_converts_stacked_seq_indices_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_flex_attention_noncontig_with_holes_False_cross_attention_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_flex_attention_noncontig_with_holes_False_cross_attention_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_flex_attention_noncontig_with_holes_True_cross_attention_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_flex_attention_noncontig_with_holes_True_cross_attention_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_index_put_error_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_is_contiguous_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_is_same_size_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_with_pinned_memory_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_padded_dense_conversion_kernels_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_padded_dense_conversion_kernels_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_padded_dense_conversion_kernels_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layout_under_torch_dispatch_mode_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_empty_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_randn_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_empty_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_full_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_ones_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_rand_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_randint_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_randn_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_zeros_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_3_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_4_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_5_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_narrow_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_activation_checkpoint_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_fx_trace_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_pass_min_max_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_pass_min_max_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_njt_cat_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_pointwise_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_transposed_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_transposed_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_transposed_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_with_holes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_with_holes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_with_holes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_permute_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_pin_memory_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_profiler_sequence_nr_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_record_stream_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_reshape_decomp_requires_grad_False_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_reshape_decomp_requires_grad_True_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_autocast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_backwards_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_backwards_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_backwards_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_flop_counter_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_contig_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_contig_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_transposed_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_transposed_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_with_holes_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_with_holes_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_specialize_dynamic_shape_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_specialize_dynamic_shape_recompile_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_split_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_split_with_sizes_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_squeeze_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_tensor_attributes_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_threshold_backward_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_copy_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_dtype_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unary_pointwise_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unary_pointwise_transposed_inputs_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_0_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_1_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_2_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_3_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_equals_2_bad_dim_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_2_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_3_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_last_dim_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unsafe_view_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_view_ragged_idx_not_one_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_views_inherit_ragged_dim_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_ceil_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cfloat_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log1p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nanmean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_neg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_embedding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_hardsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polar_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sinc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_split_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_ceil_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cfloat_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log1p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nanmean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_neg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_embedding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polar_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sinc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_split_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_all_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_any_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_bool_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_byte_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ceil_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cfloat_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_char_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_count_nonzero_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_eq_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ge_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_gt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_hash_tensor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_heaviside_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_igamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_igammac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_int_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isclose_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isfinite_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isnan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isneginf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isposinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isreal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_jiterator_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_jiterator_binary_return_by_ref_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_jiterator_unary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_le_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log1p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_and_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_not_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_or_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_xor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_long_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_lt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nanmean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ne_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_neg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nextafter_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_embedding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_softshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polar_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_short_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_signbit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sinc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_airy_ai_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_j1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_y0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_y1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_hermite_polynomial_h_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_hermite_polynomial_he_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_laguerre_polynomial_l_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_legendre_polynomial_p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_k0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_scaled_modified_bessel_k0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_scaled_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_spherical_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_zeta_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_split_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_all_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_any_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bool_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_byte_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ceil_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cfloat_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_char_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_count_nonzero_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_eq_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_floor_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ge_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_gt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_hash_tensor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_heaviside_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_igamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_igammac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_int_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isclose_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isfinite_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isnan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isneginf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isposinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isreal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_binary_return_by_ref_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_unary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_le_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log1p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_and_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_not_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_or_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_xor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_long_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_lt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nanmean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ne_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_neg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nextafter_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_embedding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_hardsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polar_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_decimals_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_short_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_signbit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sinc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_airy_ai_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_j1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_y0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_y1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_hermite_polynomial_h_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_hermite_polynomial_he_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_laguerre_polynomial_l_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_legendre_polynomial_p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_k0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_scaled_modified_bessel_k0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_scaled_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_spherical_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_zeta_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_split_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_nested_tensor_input_mutation_backward_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_nested_tensor_non_contiguous_mutation_cuda 2025-09-07T07:33:36.0782233Z 2025-09-07T07:33:36.0782446Z Running dynamo/test_export_mutations 1/1 ... [2025-09-07 07:33:35.958844] 2025-09-07T07:33:36.0782824Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:36.0783749Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_export_mutations.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:35.959203] 2025-09-07T07:33:39.9295819Z 2025-09-07T07:33:39.9297048Z dynamo/test_export_mutations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_export_mutations_1.1_cabd1e17ed53bf94_.log 2025-09-07T07:33:39.9301436Z Running 5 items in this shard: test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_1, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_2, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_3, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_4, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_positive_1 2025-09-07T07:33:39.9305007Z 2025-09-07T07:33:39.9305336Z Running inductor/test_scatter_optimization 1/1 ... [2025-09-07 07:33:39.929586] 2025-09-07T07:33:39.9305797Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:39.9306880Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_scatter_optimization.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:39.930005] 2025-09-07T07:33:40.9376231Z 2025-09-07T07:33:40.9377636Z inductor/test_minifier_isolate 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_minifier_isolate_1.1_edb7353c22415b07_.log 2025-09-07T07:33:40.9379950Z Running 2 items in this shard: test/inductor/test_minifier_isolate.py::MinifierIsolateTests::test_after_aot_cpu_runtime_error, test/inductor/test_minifier_isolate.py::MinifierIsolateTests::test_after_aot_gpu_runtime_error 2025-09-07T07:33:40.9381279Z 2025-09-07T07:33:40.9381532Z Running test_ops_jit 1/1 ... [2025-09-07 07:33:40.937696] 2025-09-07T07:33:40.9382087Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:40.9383611Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_jit.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:40.938080] 2025-09-07T07:33:47.0941183Z 2025-09-07T07:33:47.0942436Z test_ops_fwd_gradients 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_fwd_gradients_1.1_aa52e29340900e6c_.log 2025-09-07T07:33:47.2076957Z Running 3195 items in this shard: test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_H_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_H_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_T_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_T_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___getitem___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___getitem___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___radd___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___radd___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rdiv___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rdiv___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rmatmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rmatmul___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rmod___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rmul___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rpow___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rpow___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rsub___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rsub___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__batch_norm_with_update_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__chunk_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__chunk_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__native_batch_norm_legit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__segment_reduce_lengths_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__segment_reduce_offsets_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__softmax_backward_data_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__unsafe_masked_index_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__unsafe_masked_index_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_abs_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_abs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_acos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_acos_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_acosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_acosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addcdiv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addcdiv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addcmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addcmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmm_decomposed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmm_decomposed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_alias_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_alias_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_all_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_all_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_allclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_allclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_aminmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_angle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_angle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_any_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_any_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_arange_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_argsort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_argwhere_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_argwhere_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_partial_views_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_partial_views_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_asin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_asin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_asinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_asinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atan2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atanh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_baddbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_baddbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bernoulli_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bfloat16_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bfloat16_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_block_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_block_diag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bool_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bool_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_broadcast_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_broadcast_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_broadcast_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_broadcast_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bucketize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_byte_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_byte_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cartesian_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cartesian_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cauchy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cdouble_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cdouble_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ceil_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cfloat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cfloat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_chalf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_chalf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_char_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_char_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_inverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_inverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_chunk_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clamp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clamp_max_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clamp_min_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clone_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clone_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_column_stack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_column_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_combinations_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_combinations_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_conj_physical_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_conj_physical_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_constant_pad_nd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_constant_pad_nd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_contiguous_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_contiguous_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_copysign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_corrcoef_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_corrcoef_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cos_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_count_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_count_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cov_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cov_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cummax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cummin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumulative_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumulative_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_deg2rad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diag_embed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diag_embed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagflat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagflat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diff_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diff_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_digamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dist_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_div_floor_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_div_no_rounding_mode_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_div_no_rounding_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_div_trunc_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_double_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_double_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_einsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_einsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_permuted_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_permuted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_eq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_eq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_equal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_equal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_erf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_erfc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_erfinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exp2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expm1_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expm1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_eye_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_eye_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ihfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ihfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ihfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_rfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_rfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_rfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flatten_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flip_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flip_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fliplr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fliplr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flipud_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flipud_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_float_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_float_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_float_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_float_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_floor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_floor_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fmod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_frac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_frexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gather_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gather_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ge_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_geometric_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_geqrf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_geqrf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gradient_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gradient_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_grid_sampler_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_grid_sampler_3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_half_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_half_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hash_tensor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_heaviside_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_histc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hypot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_i0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_igamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_igammac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_imag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_inner_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_inner_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_int_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_int_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isfinite_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isfinite_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isinf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isnan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isnan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isneginf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isposinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isreal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isreal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_istft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_item_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_item_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_binary_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_unary_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_unary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_kron_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_kron_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_kthvalue_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ldexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ldexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_le_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lerp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lerp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lgamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cholesky_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cond_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cond_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_det_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_det_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_diagonal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eig_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eig_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigvals_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigvalsh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigvalsh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_householder_product_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_householder_product_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_inv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_inv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_inv_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_inv_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lstsq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lstsq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_factor_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_rank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_rank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_multi_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_multi_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_singular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_singular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_qr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_slogdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_slogdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_solve_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_solve_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_solve_triangular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_solve_triangular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svdvals_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svdvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_tensorinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_tensorinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_tensorsolve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_tensorsolve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vander_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vander_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vecdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vecdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vector_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vector_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log10_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log10_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log1p_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log1p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logaddexp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logcumsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logcumsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_and_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_and_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_not_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_not_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_or_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_or_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_xor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_xor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_long_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_long_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lu_unpack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lu_unpack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mH_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mH_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mT_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mT_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_normalize_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_std_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_matmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_matmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_matrix_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_matrix_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_max_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_max_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_max_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_maximum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_meshgrid_list_of_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_min_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_min_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_min_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_minimum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_movedim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_movedim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_msort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_multinomial_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nan_to_num_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nanmean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nanmean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nanmedian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nanquantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nansum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nansum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_narrow_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_narrow_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_narrow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_narrow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_native_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_native_dropout_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_native_layer_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ne_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ne_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_empty_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_ones_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nextafter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_celu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_cosine_similarity_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_ctc_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_dropout2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_dropout3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_elu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_embedding_bag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_embedding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_gelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_glu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_grid_sample_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_group_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_hardshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_hardsigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_hardswish_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_hardtanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_huber_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_area_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_kl_div_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_l1_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_layer_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_leaky_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_linear_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_local_response_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_logsigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_mish_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_mse_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_normalize_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_circular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_circular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_constant_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_constant_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_reflect_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_replicate_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_replicate_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pairwise_distance_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pixel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_prelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_relu6_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_rms_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_rrelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_selu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_silu_complex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_silu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softplus_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softsign_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softsign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_tanhshrink_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_tanhshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_threshold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_static_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_static_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_fro_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_fro_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_inf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_inf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_nuc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_nuc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_normal_in_place_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_normal_in_place_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_normal_number_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ones_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ones_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ones_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ormqr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ormqr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_outer_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_outer_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pca_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pca_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_permute_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_permute_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_permute_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_permute_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pinverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pinverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_polar_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_polygamma_polygamma_n_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_polygamma_polygamma_n_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_polygamma_polygamma_n_4_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_positive_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_positive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_qr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_quantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rad2deg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rand_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rand_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randint_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randint_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ravel_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ravel_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_real_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reciprocal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reciprocal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_remainder_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_renorm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_renorm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_repeat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_repeat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_repeat_interleave_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_repeat_interleave_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resize__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resize__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resize_as__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resize_as__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resolve_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resolve_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resolve_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resolve_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_roll_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_roll_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rot90_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rot90_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_round_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_round_decimals_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_round_decimals_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_round_decimals_neg_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rsqrt_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rsqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rsub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rsub_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scalar_tensor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scalar_tensor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_searchsorted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_select_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sgn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sgn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_short_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_short_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sigmoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_bartlett_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_blackman_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_gaussian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_general_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_general_hamming_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_hamming_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_hann_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_kaiser_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_nuttall_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signbit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sinc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sinc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_slice_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_slice_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_slice_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sparse_mm_reduce_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sparse_sampled_addmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sparse_sampled_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_airy_ai_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_bessel_j1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_bessel_y0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_bessel_y1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_entr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_erfcx_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_hermite_polynomial_h_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_hermite_polynomial_he_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_i0e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_i1e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_legendre_polynomial_p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_log_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_modified_bessel_i0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_modified_bessel_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_ndtri_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_spherical_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_xlog1py_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_zeta_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_list_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_list_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_with_sizes_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_with_sizes_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_with_sizes_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_with_sizes_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sqrt_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_square_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_square_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_multiple_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_multiple_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_stack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_stft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_stft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sub_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sum_to_size_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sum_to_size_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_svd_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_svd_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_t_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_t_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_t_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_take_along_dim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_take_along_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_take_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_take_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tanh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tensor_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tensor_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tensordot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tensordot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tile_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_to_sparse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_to_sparse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_topk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_transpose_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_transpose_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_transpose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_transpose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trapz_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trapz_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_triangular_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_triangular_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tril_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tril_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_triu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_triu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_true_divide_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_true_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trunc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unbind_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unbind_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unbind_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unbind_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unflatten_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unflatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unfold_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unfold_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_uniform_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_uniform_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unique_consecutive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unique_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsafe_chunk_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsafe_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsafe_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsafe_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_where_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_where_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_xlogy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zero__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zero__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_H_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_H_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_T_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_T_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___getitem___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___getitem___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___radd___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___radd___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rdiv___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rdiv___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rmatmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rmatmul___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rmod___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rmul___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rpow___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rpow___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rsub___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rsub___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__batch_norm_with_update_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__chunk_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__chunk_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__native_batch_norm_legit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__segment_reduce_lengths_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__segment_reduce_offsets_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__softmax_backward_data_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__unsafe_masked_index_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__unsafe_masked_index_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__upsample_bilinear2d_aa_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_abs_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_abs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_acos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_acos_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_acosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_acosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addcdiv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addcdiv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addcmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addcmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_decomposed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_decomposed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_alias_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_alias_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_all_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_all_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_allclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_allclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_aminmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_angle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_angle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_any_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_any_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_arange_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argsort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argwhere_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argwhere_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_partial_views_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_partial_views_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_asin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_asin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_asinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_asinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atan2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atanh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atleast_1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atleast_1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atleast_2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atleast_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atleast_3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atleast_3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_baddbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_baddbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bernoulli_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bfloat16_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bfloat16_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_block_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_block_diag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bool_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bool_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_broadcast_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_broadcast_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_broadcast_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_broadcast_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bucketize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_byte_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_byte_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cartesian_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cartesian_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cauchy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cdouble_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cdouble_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ceil_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cfloat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cfloat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_chalf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_chalf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_char_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_char_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_inverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_inverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_chunk_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_clamp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_clamp_max_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_clamp_min_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_clone_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_clone_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_column_stack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_column_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_combinations_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_combinations_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_conj_physical_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_conj_physical_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_constant_pad_nd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_constant_pad_nd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_contiguous_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_contiguous_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_copysign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_corrcoef_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_corrcoef_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cos_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_count_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_count_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cov_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cov_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cummax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cummin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cumulative_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cumulative_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_deg2rad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diag_embed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diag_embed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagflat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagflat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diff_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diff_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_digamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dist_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_div_floor_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_div_no_rounding_mode_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_div_no_rounding_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_div_trunc_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_double_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_double_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_einsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_einsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_permuted_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_permuted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_eq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_eq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_equal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_equal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_erf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_erfc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_erfinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exp2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expm1_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expm1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_eye_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_eye_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ihfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ihfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ihfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_rfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_rfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_rfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flatten_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flip_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flip_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fliplr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fliplr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flipud_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flipud_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_float_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_float_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_float_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_float_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_floor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_floor_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fmod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_frac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_frexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_full_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_full_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_gather_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_gather_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ge_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_geometric_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_geqrf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_geqrf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_gradient_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_gradient_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_grid_sampler_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_grid_sampler_3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_gt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_half_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_half_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hash_tensor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_heaviside_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_histc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hypot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_i0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_igamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_igammac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_imag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_inner_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_inner_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_int_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_int_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isfinite_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isfinite_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isinf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isnan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isnan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isneginf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isposinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isreal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isreal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_istft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_item_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_item_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_binary_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_unary_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_unary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_kron_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_kron_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_kthvalue_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ldexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ldexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_le_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lerp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lerp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lgamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cond_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cond_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_det_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_det_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_diagonal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eig_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eig_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigvals_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigvalsh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigvalsh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_householder_product_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_householder_product_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_inv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_inv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_inv_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_inv_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_factor_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_rank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_rank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_multi_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_multi_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_singular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_singular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_qr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_slogdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_slogdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_solve_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_solve_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_solve_triangular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_solve_triangular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_svdvals_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_svdvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_tensorinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_tensorinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_tensorsolve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_tensorsolve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vander_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vander_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vecdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vecdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vector_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vector_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log10_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log10_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log1p_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log1p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logaddexp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logcumsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logcumsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_and_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_and_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_not_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_not_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_or_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_or_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_xor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_xor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_long_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_long_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_unpack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_unpack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mH_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mH_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mT_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mT_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_normalize_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_std_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_matmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_matmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_matrix_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_matrix_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_max_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_max_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_max_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_maximum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_meshgrid_list_of_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_meshgrid_variadic_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_min_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_min_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_min_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_minimum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_movedim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_movedim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_msort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_multinomial_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nan_to_num_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nanmean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nanmean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nanmedian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nanquantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nansum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nansum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_narrow_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_narrow_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_narrow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_narrow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_native_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_native_dropout_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_native_layer_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ne_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ne_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_empty_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_ones_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nextafter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_alpha_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_celu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_channel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_cosine_similarity_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_ctc_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_dropout2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_dropout3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_elu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_embedding_bag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_embedding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_gelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_glu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_grid_sample_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_group_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_hardshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_hardsigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_hardswish_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_hardtanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_huber_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_instance_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_area_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_kl_div_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_l1_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_layer_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_leaky_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_linear_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_local_response_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_logsigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_unpool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_unpool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_unpool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_mish_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_mse_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_normalize_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_circular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_circular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_constant_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_constant_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_reflect_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_reflect_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_replicate_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_replicate_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pairwise_distance_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pixel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_prelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_relu6_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_rms_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_rms_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_rrelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_selu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_silu_complex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_silu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softplus_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softsign_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softsign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_tanhshrink_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_tanhshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_threshold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_upsample_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_static_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_static_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_fro_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_fro_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_inf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_inf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_nuc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_nuc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_normal_in_place_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_normal_in_place_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_normal_number_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ones_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ones_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ones_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ormqr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ormqr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_outer_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_outer_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_pca_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_pca_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_permute_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_permute_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_permute_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_permute_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_pinverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_pinverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polar_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polygamma_polygamma_n_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polygamma_polygamma_n_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polygamma_polygamma_n_2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polygamma_polygamma_n_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polygamma_polygamma_n_4_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_positive_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_positive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_pow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_pow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_qr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_quantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rad2deg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rand_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rand_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randint_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randint_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randn_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randn_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ravel_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ravel_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_real_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reciprocal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reciprocal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_remainder_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_renorm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_renorm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_repeat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_repeat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_repeat_interleave_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_repeat_interleave_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reshape_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reshape_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reshape_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reshape_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resize__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resize__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resize_as__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resize_as__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resolve_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resolve_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resolve_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resolve_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_roll_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_roll_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rot90_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rot90_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_round_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_round_decimals_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_round_decimals_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_round_decimals_neg_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rsqrt_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rsqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rsub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rsub_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scalar_tensor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scalar_tensor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_searchsorted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_select_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sgn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sgn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_short_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_short_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sigmoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_bartlett_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_blackman_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_gaussian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_general_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_general_hamming_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_hamming_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_hann_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_kaiser_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_nuttall_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signbit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_slice_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_slice_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_slice_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sparse_mm_reduce_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sparse_sampled_addmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sparse_sampled_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_airy_ai_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_bessel_j1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_bessel_y0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_bessel_y1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_entr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_erfcx_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_hermite_polynomial_h_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_hermite_polynomial_he_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_i0e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_i1e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_laguerre_polynomial_l_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_legendre_polynomial_p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_log_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_modified_bessel_i0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_modified_bessel_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_ndtri_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_spherical_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_xlog1py_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_zeta_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_list_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_list_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_with_sizes_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_with_sizes_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_with_sizes_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_with_sizes_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sqrt_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_square_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_square_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_multiple_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_multiple_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_stack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_stft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_stft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sub_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_to_size_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_to_size_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_svd_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_svd_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_t_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_t_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_t_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_along_dim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_along_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tanh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tensor_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tensor_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tensordot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tensordot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tile_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_to_sparse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_to_sparse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_topk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_transpose_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_transpose_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_transpose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_transpose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trapz_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trapz_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_triangular_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_triangular_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tril_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tril_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_triu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_triu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_true_divide_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_true_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trunc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unbind_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unbind_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unbind_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unbind_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unflatten_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unflatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unfold_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unfold_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_uniform_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_uniform_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unique_consecutive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unique_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsafe_chunk_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsafe_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsafe_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsafe_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsqueeze_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsqueeze_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsqueeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsqueeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_as_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_as_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_where_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_where_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_xlogy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zero__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zero__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_H_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_H_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_T_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_T_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___getitem___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___getitem___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___radd___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___radd___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rdiv___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rdiv___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rmatmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rmatmul___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rmod___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rmul___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rpow___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rpow___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rsub___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rsub___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__batch_norm_with_update_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__chunk_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__chunk_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__native_batch_norm_legit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__segment_reduce_lengths_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__segment_reduce_offsets_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__softmax_backward_data_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__unsafe_masked_index_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__unsafe_masked_index_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__upsample_bilinear2d_aa_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_abs_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_abs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_acos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_acos_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_acosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_acosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addcdiv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addcdiv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addcmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addcmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmm_decomposed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmm_decomposed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_alias_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_alias_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_all_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_all_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_allclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_allclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_aminmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_angle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_angle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_any_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_any_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_arange_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argsort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argwhere_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argwhere_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_partial_views_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_partial_views_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_asin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_asin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_asinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_asinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atan2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atanh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_baddbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_baddbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bernoulli_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bfloat16_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bfloat16_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_block_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_block_diag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bool_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bool_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_broadcast_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_broadcast_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_broadcast_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_broadcast_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bucketize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_byte_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_byte_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cartesian_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cartesian_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cauchy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cdouble_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cdouble_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ceil_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cfloat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cfloat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_chalf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_chalf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_char_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_char_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_inverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_inverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_chunk_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_clamp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_clamp_max_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_clamp_min_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_clone_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_clone_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_column_stack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_column_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_combinations_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_combinations_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_physical_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_physical_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_constant_pad_nd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_constant_pad_nd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_contiguous_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_contiguous_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_copysign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_corrcoef_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_corrcoef_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cos_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_count_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_count_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cov_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cov_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cummax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cummin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumulative_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumulative_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_deg2rad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diag_embed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diag_embed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagflat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagflat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diff_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diff_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_digamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dist_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_div_floor_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_div_no_rounding_mode_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_div_no_rounding_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_div_trunc_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_double_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_double_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_einsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_einsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_permuted_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_permuted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_eq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_eq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_equal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_equal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_erf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_erfc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_erfinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_exp2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_exp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expm1_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expm1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_eye_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_eye_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_hfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_hfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_hfft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_hfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_hfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_hfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ihfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ihfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ihfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_rfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_rfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_rfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flatten_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flip_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flip_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fliplr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fliplr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flipud_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flipud_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_float_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_float_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_float_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_float_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_floor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_floor_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fmod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_frac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_frexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_full_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_full_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_gather_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_gather_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ge_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_geometric_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_geqrf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_geqrf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_gradient_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_gradient_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_grid_sampler_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_grid_sampler_3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_gt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_half_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_half_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hash_tensor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_heaviside_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_histc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hypot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_i0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_igamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_igammac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_imag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_inner_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_inner_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_int_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_int_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isfinite_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isfinite_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isinf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isnan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isnan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isneginf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isposinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isreal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isreal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_istft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_item_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_item_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_binary_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_unary_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_unary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_kron_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_kron_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_kthvalue_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ldexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ldexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_le_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lerp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lerp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lgamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cholesky_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cholesky_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cond_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cond_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_det_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_det_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_diagonal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eig_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eig_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eigh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eigh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eigvals_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eigvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eigvalsh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eigvalsh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_householder_product_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_householder_product_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_inv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_inv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_inv_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_inv_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lstsq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lstsq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_factor_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_rank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_rank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_multi_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_multi_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_pinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_pinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_pinv_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_pinv_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_pinv_singular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_pinv_singular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_qr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_slogdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_slogdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_triangular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_triangular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_svdvals_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_svdvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_tensorinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_tensorinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_tensorsolve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_tensorsolve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vander_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vander_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vecdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vecdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vector_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vector_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log10_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log10_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log1p_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log1p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logaddexp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logcumsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logcumsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_and_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_and_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_not_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_not_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_or_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_or_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_xor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_xor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_long_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_long_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lu_unpack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lu_unpack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mH_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mH_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mT_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mT_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_normalize_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_std_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_matmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_matmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_matrix_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_matrix_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_max_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_max_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_max_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_maximum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_meshgrid_list_of_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_meshgrid_variadic_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_min_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_min_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_min_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_minimum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_movedim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_movedim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_msort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_multinomial_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nan_to_num_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanmean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanmean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanmedian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanquantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nansum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nansum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_narrow_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_narrow_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_narrow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_narrow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_native_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_native_dropout_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_native_layer_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ne_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ne_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_empty_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_ones_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nextafter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_alpha_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_celu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_channel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_cosine_similarity_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_ctc_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_dropout2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_dropout3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_elu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_embedding_bag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_embedding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_gelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_glu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_grid_sample_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_group_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_hardshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_hardsigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_hardswish_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_hardtanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_huber_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_instance_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_area_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_kl_div_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_l1_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_layer_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_leaky_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_linear_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_local_response_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_logsigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_mish_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_mse_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_normalize_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_circular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_circular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_constant_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_constant_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_reflect_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_reflect_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_replicate_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_replicate_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pairwise_distance_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pixel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_prelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_relu6_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_rms_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_rms_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_rrelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_selu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_silu_complex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_silu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softplus_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softsign_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softsign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_tanhshrink_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_tanhshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_threshold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_upsample_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nonzero_static_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nonzero_static_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_fro_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_fro_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_inf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_inf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_nuc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_nuc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_normal_in_place_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_normal_in_place_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_normal_number_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ones_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ones_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ones_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ormqr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ormqr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_outer_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_outer_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pca_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pca_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_permute_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_permute_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_permute_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_permute_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pinverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pinverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polar_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polygamma_polygamma_n_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polygamma_polygamma_n_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polygamma_polygamma_n_2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polygamma_polygamma_n_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polygamma_polygamma_n_4_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_positive_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_positive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_qr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_quantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rad2deg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rand_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rand_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randint_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randint_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randn_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randn_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ravel_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ravel_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_real_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reciprocal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reciprocal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_remainder_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_renorm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_renorm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_repeat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_repeat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_repeat_interleave_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_repeat_interleave_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resize__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resize__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resize_as__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resize_as__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_roll_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_roll_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rot90_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rot90_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_round_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_round_decimals_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_round_decimals_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_round_decimals_neg_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rsqrt_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rsqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rsub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rsub_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scalar_tensor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scalar_tensor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_reduce_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_searchsorted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_select_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sgn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sgn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_short_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_short_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sigmoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_bartlett_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_blackman_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_gaussian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_general_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_general_hamming_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_hamming_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_hann_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_kaiser_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_nuttall_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signbit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sinc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sinc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_slice_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_slice_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_slice_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sparse_mm_reduce_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sparse_sampled_addmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sparse_sampled_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_airy_ai_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_bessel_j1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_bessel_y0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_bessel_y1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_entr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_erfcx_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_hermite_polynomial_h_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_hermite_polynomial_he_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_i0e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_i1e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_laguerre_polynomial_l_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_legendre_polynomial_p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_log_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_modified_bessel_i0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_modified_bessel_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_ndtri_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_spherical_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_xlog1py_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_zeta_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_list_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_list_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_with_sizes_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_with_sizes_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_with_sizes_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_with_sizes_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sqrt_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_square_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_square_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_multiple_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_multiple_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_stack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_stft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_stft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sub_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sum_to_size_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sum_to_size_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_t_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_t_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_t_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_take_along_dim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_take_along_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_take_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_take_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tanh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tensor_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tensor_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tensordot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tensordot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tile_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_to_sparse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_to_sparse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_topk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trapz_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trapz_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_triangular_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_triangular_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tril_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tril_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_triu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_triu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_true_divide_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_true_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trunc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unbind_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unbind_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unbind_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unbind_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unflatten_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unflatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_uniform_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_uniform_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unique_consecutive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unique_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_chunk_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsqueeze_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsqueeze_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsqueeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsqueeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_as_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_as_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_where_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_where_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_xlogy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zero__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zero__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zeros_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zeros_like_cuda_float64 2025-09-07T07:33:47.3135111Z 2025-09-07T07:33:47.3135359Z Running torch_np/numpy_tests/core/test_multiarray 1/2 ... [2025-09-07 07:33:47.099359] 2025-09-07T07:33:47.3135786Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:47.3136763Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_multiarray.py', '-m', 'not serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:47.099690] 2025-09-07T07:33:47.3548527Z 2025-09-07T07:33:47.3549463Z inductor/test_scatter_optimization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_scatter_optimization_1.1_437ef090dfa41a57_.log 2025-09-07T07:33:47.3553196Z Running 8 items in this shard: test/inductor/test_scatter_optimization.py::TestScatterOpt::test_3d_tensor, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_can_not_optimize_due_to_dense, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_can_not_optimize_due_to_non_const, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_cross_entropy_loss, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_neg_scatter_dim, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_non_last_dim, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_nonzero_const_tensor, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_shorter_index_tensor 2025-09-07T07:33:47.3556672Z 2025-09-07T07:33:47.3617058Z Running torch_np/numpy_tests/core/test_multiarray 2/2 ... [2025-09-07 07:33:47.355004] 2025-09-07T07:33:47.3617755Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:47.3619371Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_multiarray.py', '-m', 'not serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:47.355426] 2025-09-07T07:33:52.1325200Z 2025-09-07T07:33:52.1326148Z torch_np/numpy_tests/core/test_multiarray 2/2 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_multiarray_2.2_0243d6dcc9170108_.log 2025-09-07T07:33:52.1471337Z Running 430 items in this shard: test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_otherflags, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_void_align, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_from_readonly, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_pickle, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_attributes_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_dtypeattr, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill_max_uint64, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill_struct_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_stridesattr, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_0d_array_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_asarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_asfortranarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_true_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_empty, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_asarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_unicode_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestDtypedescr::test_construction, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_ellipsis_subscript, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_empty_subscript, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_invalid_subscript, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_invalid_subscript_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_overlapping_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_array_of_ragged_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_deep_nonragged_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_failed_len_sequence, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_malloc_fails, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_no_len_object_type, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_non_sequence_sequence, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype0_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype0_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_(2,3)O_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_(2,3)O_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,(3)O_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,O_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_void, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_like_like_zeros, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_obj, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_obj_obj, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_cast_from_unicode, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_count_nonzero, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_count_nonzero_unaligned, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_sum_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test__complex__, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test__complex__should_not_work, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test__deepcopy___dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_all_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_?, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_e, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argsort, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_2_func0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_2_func1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_choose, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_choose_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_compress, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_conjugate, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_diagonal, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_diagonal_memleak, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_matmul_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_?, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_put, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_repeat, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_reshape, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_round, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_floats_default_dtype, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_floats_f16, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_n_elements, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_resetting, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_type_specific, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_type_specific_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_unaligned_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_with_invalid_sorter, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_with_sorter, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_size_zero_memleak, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype1_part_real, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype5, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_size_0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_swapaxes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_trace, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_transpose, test/torch_np/numpy_tests/core/test_multiarray.py::TestCequenceMethods::test_array_contains, test/torch_np/numpy_tests/core/test_multiarray.py::TestBinop::test_inplace, test/torch_np/numpy_tests/core/test_multiarray.py::TestSubscripting::test_test_zero_rank, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_assign_mask2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_mask2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_tuple, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_all_method_max, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_all_method_min, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_output_shape_method_argmin, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmin, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data10, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data11, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data12, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data14, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data17, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data18, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data19, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data20, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data22, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data23, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data24, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data27, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data28, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data31, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data32, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data33, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data35, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data36, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data37, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data38, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data4, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data40, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data41, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data43, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data47, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data48, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data49, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data5, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data50, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data51, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data53, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data54, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data55, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data57, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data59, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data6, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data60, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data61, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data7, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data9, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data10, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data12, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data13, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data14, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data18, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data19, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data20, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data21, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data22, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data23, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data24, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data25, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data29, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data31, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data36, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data38, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data39, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data4, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data42, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data44, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data47, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data49, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data53, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data54, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data55, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data56, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data58, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data59, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data6, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data60, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data61, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinMax::test_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestClip::test_max_or_min, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_byteorder_greater_False, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_ip_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_kwargs, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_mask_size, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_record_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_writeable, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_raise, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ret_is_out_shape2, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_wrap, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype6, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype7, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_invalid_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_big_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_bool_fromstring, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_counted_string_with_ws, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_empty_files_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromfile_subarray_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromstring_count0, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_int64_fromstring, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_io_open_unbuffered_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_long_sep, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_nan, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_parsing_subarray_unsupported, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_file, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_str, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_tofile_sep, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_uint64_fromstring, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_array_base_obj0, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_array_base_obj_12345678, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_big_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_big_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_big_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_little_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_little_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_empty, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_0d_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_check_reference, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_check_weakref, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_invalid_arguments, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_ddof_too_big, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_empty, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_axis_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_std_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_std_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_complex_values_complex_dtype_complex128_ndec_7, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_complex_values_complex_dtype_complex64_ndec_6, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_vdot_uncontiguous_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_accelerate_framework_sgemv_fix, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_array_order, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotcolumnvect1, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecmat, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecmat2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecmat3, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecscalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmN1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmN2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT5, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT6, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mv12, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN6, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN9, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvn10, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matmul_exception_add, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matmul_exception_multiply, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matrix_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matrix_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_out_arg, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_out_contiguous_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_result_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_array_priority_override, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matrix_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matrix_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_result_types_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_vector_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_inner_product_reversed_view, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_inner_product_with_various_contiguities, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_vecself, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_broadcast2, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_docstring_1, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops1, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops2, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_broadcast1, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_broadcast2, test/torch_np/numpy_tests/core/test_multiarray.py::TestWarnings::test_complex_warning, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_complex, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_float, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_nonscalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_field_order, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_padding_with_array_inside_struct, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_trailing_padding, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test___array__, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_buffer_interface, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_compatible_cast, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_flags_not_writable_attribute_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_flags_writable_attribute_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_not_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_array_scalar_relational_operation, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_to_bool_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_to_int_scalar_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_exotic, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_exotic_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_foreign, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_kwargs, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_largedim, test/torch_np/numpy_tests/core/test_multiarray.py::TestFormat::test_1d_no_format, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_argmax_with_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_argmin_with_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_choose_mod_raise, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_flatiter__array__, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_insert_noncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_putmask_noncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_take_mode_raise, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_arange_booleans, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_error_paths_and_promotion_which_0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_error_paths_and_promotion_which_1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_error_paths_and_promotion_which_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_explicit_dtype_dt0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_require_range, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_require_range_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_start_stop_kwarg, test/torch_np/numpy_tests/core/test_multiarray.py::TestRichcompareScalar::test_richcompare_scalar_boolean_singleton_return, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_128, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_191, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_256, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_512, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_8, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_96, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_int 2025-09-07T07:33:52.1604591Z 2025-09-07T07:33:52.1604759Z Running functorch/test_ac 1/1 ... [2025-09-07 07:33:52.133325] 2025-09-07T07:33:52.1605113Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:52.1606114Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ac.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:52.133745] 2025-09-07T07:33:59.3585386Z 2025-09-07T07:33:59.3586233Z functorch/test_ac 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ac_1.1_817b5455cac44269_.log 2025-09-07T07:33:59.3586935Z 2025-09-07T07:33:59.3587724Z Running dynamo/test_higher_order_ops 1/1 ... [2025-09-07 07:33:59.358586] 2025-09-07T07:33:59.3588258Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:33:59.3592855Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_higher_order_ops.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:33:59.359046] 2025-09-07T07:34:07.6859598Z 2025-09-07T07:34:07.6860589Z dynamo/test_higher_order_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_higher_order_ops_1.1_eb8408f03e0aef7f_.log 2025-09-07T07:34:07.6936752Z Running 229 items in this shard: test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_access_module_attr, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_allow_python_side_effects_utility, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_constants, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_global_num, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_global_num_adds_guard, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_input_num, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_numpy_number, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_tracked, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_tracked_nested, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_untracked_global, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_untracked_global_nested, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_untracked_nonlocal, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_value_created_in_subgraph, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_concat_unbacked_shape_tensor, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_branches_no_arguments, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_branches_no_arguments_no_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_free_variable_in_both_branches, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_graph_break_in_one_branch, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_pytree_operands, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_pytree_operands_with_non_tensor_leaves, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_side_effect_in_one_branches, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_source_fn_stack, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_subgraph_name_is_valid, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_with_constant_pred, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_with_empty_operands, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_dynamic_shapes_over_vmap_batch_size, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_enum_arg, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_error_message_sane, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_fallback_on_graph_break_complicated, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_fallback_on_graph_break_simple, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_flat_list_output, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_fn_with_kwargs_in_torch_ops, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_freevars_as_inputs_to_wrap, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_grad_source_fn_stack, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hints_wrapper, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hints_wrapper_incorrect_type, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hints_wrapper_no_hints, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hints_wrapper_pytree_inputs, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hooks, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hopify_generic_wrap, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_inlined_functions, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_internal_nonlocal, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_lift_tensor_constant, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_lift_tensors_with_compound_expressions, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_lift_tensors_with_shared_symbols, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_make_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_example_value_metadata_consistent_with_eager, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_graph_break, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_kwargs, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_lowers_to_graph, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_multi_return, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_pytree_return, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_side_effect, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_source_fn_stack, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_subgraph_name_is_valid, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_symint_input, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_modules, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_nested_tuple_output, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_nested_wrap, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_no_freevars, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_output_with_dict, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_register_mode, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_register_subclass, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_return_captured_var, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_return_captured_var_used_multiple_times, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_return_captured_vars, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_same_freevar_twice, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_del_existing_attr_global_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_del_existing_attr_global_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_del_existing_attr_nonlocal_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_del_existing_attr_nonlocal_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_in_body, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_local_list_append_no_graph_break, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_global_list, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_global_num, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_global_num_builtin, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_global_tensor, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_global_tensor_builtin, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_nonlocal_num, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_nonlocal_num_builtin, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_nonlocal_tensor, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_nonlocal_tensor_builtin, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_nested_nonlocal_list_append_graph_break, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_nonlocal_list_append_graph_break, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_existing_attr_global_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_existing_attr_global_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_existing_attr_nonlocal_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_existing_attr_nonlocal_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_new_attr_global_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_new_attr_global_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_new_attr_nonlocal_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_new_attr_nonlocal_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_support_float_in_output, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_symint_in_slice, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_symint_input, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_tensor_and_unbacked_symbol_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_tensor_to_list_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_tensor_with_unbacked_shape_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_unbacked_symbol_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_vmap_multiply_scalar, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_vmap_source_fn_stack, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_all_kwarg, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_allow_local_assign_in_body_fn, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_default, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_default_else_branch, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_default_if_branch, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_int, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_only, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_recompile, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_pytree_args_nested, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_pytree_args_not_const_symint_tensor, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_pytree_args_with_symint_constant, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_pytree_kwargs, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_source_fn_stack, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_subgraph_name_is_valid, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_dual_level_guard, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_emit_functorch_guard_if_active, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_grad_guard_fail, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_jvp_guard_fail, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_linearize_recompiles, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_grad_guard_ok, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_grad_vmap_guard_fail, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_guard_fail, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_guard_fail_different_state, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_guard_ok, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_recompile_different_states, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call_disable_inline_nn_module, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call_sequential_params_and_buffers, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_call_compiled_backward_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_call_torch_compile_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_capture_tensor, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_closure_scalar, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_fn_with_kwargs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_freevar_python_scalar, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_freevar_tensor, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_non_tensor_input, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_over_grad, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_pytree, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_recompile, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_two_tensor_all_grad_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_two_tensor_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_with_graph_break, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_with_side_effect, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_hessian, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_hessian_argnums, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacfwd, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacfwd_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacfwd_randomness, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacfwd_two_tensors_argnums, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacrev, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacrev_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacrev_two_tensors_argnums, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_call_torch_compile_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_freevar_python_scalar, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_freevar_tensor, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_jvp, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_simple, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_two_tensors_disable_enable_disable_grad, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_two_tensors_disable_grad, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_two_tensors_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_linearize_jvp_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vjp, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vjp_call_compiled_backward_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vjp_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vjp_multiple_outputs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vjp_multiple_outputs_python_struct, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_call_compiled_backward_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_call_torch_compile_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_free_const, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_free_tensor, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_get_wrapped, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_kwargs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_multiple_invocation_in_dims, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_multiple_invocation_out_dims, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_multiple_outputs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_multiple_outputs_diff_dims, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_multiple_outputs_out_dims_tuple, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_new_tensor_implicit_via_op, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_new_tensor_in_body, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_new_tensor_unused_in_body, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_out_dims_None, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_over_vmap_captured, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_over_vmap_two_inputs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_previous_illegal_op_no_graph_break, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_pytree_inputs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_recompile, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_recompile_different_config, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_recompile_same_config, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_recompile_with_randomness, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_side_effects, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_side_effects_append_input, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_two_inputs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_two_inputs_tuple_in_dims, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_with_conditional_graph_break, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_with_graph_break, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_with_graph_break_2, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_with_graph_break_lambda, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_cond_with_invalid_kwargs, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_cond_with_kwargs, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_cond_with_mismatched_output, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_dropout, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_dropout_inductor, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_fallback, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_flop_counter_for_cond, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_flop_counter_for_cond_unbalanced_branches, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_flop_counter_for_nested_cond, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_function, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_function_with_kwargs, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_module, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_non_aliasing_util, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_override_fallthrough_dispatch_key, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_auto_functionalize_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_cond_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_invoke_quant_packed_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_invoke_quant_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_invoke_subgraph_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_while_loop_stack_output_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_auto_functionalize_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_cond_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_invoke_quant_packed_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_invoke_quant_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_invoke_subgraph_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_while_loop_stack_output_simple_cuda_float32 2025-09-07T07:34:07.7006841Z 2025-09-07T07:34:07.7007016Z Running dynamo/test_comptime 1/1 ... [2025-09-07 07:34:07.686395] 2025-09-07T07:34:07.7007378Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:07.7008379Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_comptime.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:07.686739] 2025-09-07T07:34:11.8576690Z 2025-09-07T07:34:11.8577508Z dynamo/test_comptime 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_comptime_1.1_d86bb629664fe760_.log 2025-09-07T07:34:11.8581581Z Running 12 items in this shard: test/dynamo/test_comptime.py::ComptimeTests::test_get_local, test/dynamo/test_comptime.py::ComptimeTests::test_get_local_closure_variable, test/dynamo/test_comptime.py::ComptimeTests::test_graph_break, test/dynamo/test_comptime.py::ComptimeTests::test_print_bt, test/dynamo/test_comptime.py::ComptimeTests::test_print_direct, test/dynamo/test_comptime.py::ComptimeTests::test_print_disas, test/dynamo/test_comptime.py::ComptimeTests::test_print_graph, test/dynamo/test_comptime.py::ComptimeTests::test_print_guards, test/dynamo/test_comptime.py::ComptimeTests::test_print_locals, test/dynamo/test_comptime.py::ComptimeTests::test_print_single, test/dynamo/test_comptime.py::ComptimeTests::test_print_value_stack, test/dynamo/test_comptime.py::ComptimeTests::test_sleep 2025-09-07T07:34:11.8585451Z 2025-09-07T07:34:11.8585779Z Running test_datapipe 1/1 ... [2025-09-07 07:34:11.857852] 2025-09-07T07:34:11.8586226Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:11.8587396Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_datapipe.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:11.858228] 2025-09-07T07:34:15.8286722Z 2025-09-07T07:34:15.8287429Z test_datapipe 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_datapipe_1.1_f87d76d37fe0ebf8_.log 2025-09-07T07:34:15.8314091Z Running 94 items in this shard: test/test_datapipe.py::TestDataChunk::test_as_string, test/test_datapipe.py::TestDataChunk::test_getitem, test/test_datapipe.py::TestDataChunk::test_iter, test/test_datapipe.py::TestDataChunk::test_len, test/test_datapipe.py::TestDataChunk::test_random_shuffle, test/test_datapipe.py::TestDataChunk::test_reverse, test/test_datapipe.py::TestDataChunk::test_sort, test/test_datapipe.py::TestStreamWrapper::test_api, test/test_datapipe.py::TestStreamWrapper::test_dir, test/test_datapipe.py::TestStreamWrapper::test_pickle, test/test_datapipe.py::TestStreamWrapper::test_repr, test/test_datapipe.py::TestIterableDataPipeBasic::test_demux_mux_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_groupby_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_listdirfiles_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_listdirfilesdeterministic_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_map_with_col_file_handle_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_openfilesfromdisk_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_routeddecoder_iterable_datapipe, test/test_datapipe.py::TestCaptureDataFrame::test_basic_capture, test/test_datapipe.py::TestDataFramesPipes::test_batch, test/test_datapipe.py::TestDataFramesPipes::test_capture, test/test_datapipe.py::TestDataFramesPipes::test_collate, test/test_datapipe.py::TestDataFramesPipes::test_filter, test/test_datapipe.py::TestDataFramesPipes::test_shuffle, test/test_datapipe.py::TestDataFramesPipes::test_unbatch, test/test_datapipe.py::TestFunctionalIterDataPipe::test_batch_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_collate_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_concat_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_demux_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_docstring, test/test_datapipe.py::TestFunctionalIterDataPipe::test_filter_datapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_fork_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_iterable_wrapper_datapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_map_dict_with_col_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_map_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_map_tuple_list_with_col_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_mux_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_sampler_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_serializable, test/test_datapipe.py::TestFunctionalIterDataPipe::test_serializable_with_dill, test/test_datapipe.py::TestFunctionalIterDataPipe::test_shuffler_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_stream_reader_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_unbatch_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_zip_iterdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_batch_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_concat_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_docstring, test/test_datapipe.py::TestFunctionalMapDataPipe::test_map_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_sequence_wrapper_datapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_serializable, test/test_datapipe.py::TestFunctionalMapDataPipe::test_serializable_with_dill, test/test_datapipe.py::TestFunctionalMapDataPipe::test_shuffler_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_zip_mapdatapipe, test/test_datapipe.py::TestTyping::test_compile_time, test/test_datapipe.py::TestTyping::test_construct_time, test/test_datapipe.py::TestTyping::test_isinstance, test/test_datapipe.py::TestTyping::test_issubinstance, test/test_datapipe.py::TestTyping::test_protocol, test/test_datapipe.py::TestTyping::test_reinforce, test/test_datapipe.py::TestTyping::test_runtime, test/test_datapipe.py::TestTyping::test_subtype, test/test_datapipe.py::TestGraph::test_simple_traverse, test/test_datapipe.py::TestGraph::test_traverse_circular_datapipe, test/test_datapipe.py::TestGraph::test_traverse_forked, test/test_datapipe.py::TestGraph::test_traverse_mapdatapipe, test/test_datapipe.py::TestGraph::test_traverse_mixdatapipe, test/test_datapipe.py::TestGraph::test_traverse_unhashable_datapipe, test/test_datapipe.py::TestSerialization::test_spawn_lambdas_iter, test/test_datapipe.py::TestSerialization::test_spawn_lambdas_map, test/test_datapipe.py::TestCircularSerialization::test_circular_serialization_with_dill, test/test_datapipe.py::TestCircularSerialization::test_circular_serialization_with_pickle, test/test_datapipe.py::TestSharding::test_legacy_custom_sharding, test/test_datapipe.py::TestSharding::test_legacy_custom_sharding_with_old_dataloader, test/test_datapipe.py::TestSharding::test_multi_sharding, test/test_datapipe.py::TestSharding::test_old_dataloader, test/test_datapipe.py::TestSharding::test_sharding_groups, test/test_datapipe.py::TestSharding::test_sharding_groups_in_legacy_grouping_package, test/test_datapipe.py::TestSharding::test_sharding_length, test/test_datapipe.py::TestSharding::test_simple_sharding, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_buggy, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_constraint_multiple_outputs, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_generator, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_new_object, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_self_next, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_generator_function, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_generator_function_exception, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_next, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_next_exception, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_return_self, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_custom_non_generator, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_custom_self_next, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_graph, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_graph_repeated, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_graph_with_serialization 2025-09-07T07:34:15.8337367Z 2025-09-07T07:34:15.8337542Z Running dynamo/test_logging 1/1 ... [2025-09-07 07:34:15.828787] 2025-09-07T07:34:15.8337893Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:15.8338793Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_logging.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:15.829160] 2025-09-07T07:34:23.4049899Z 2025-09-07T07:34:23.4050830Z dynamo/test_logging 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_logging_1.1_3a02c332f4c8f71a_.log 2025-09-07T07:34:23.4065029Z Running 51 items in this shard: test/dynamo/test_logging.py::LoggingTests::test_all, test/dynamo/test_logging.py::LoggingTests::test_aot, test/dynamo/test_logging.py::LoggingTests::test_aot_graphs, test/dynamo/test_logging.py::LoggingTests::test_aot_joint_graph, test/dynamo/test_logging.py::LoggingTests::test_autotuning, test/dynamo/test_logging.py::LoggingTests::test_bytecode, test/dynamo/test_logging.py::LoggingTests::test_cudagraph_static_inputs, test/dynamo/test_logging.py::LoggingTests::test_cudagraphs, test/dynamo/test_logging.py::LoggingTests::test_custom_format, test/dynamo/test_logging.py::LoggingTests::test_custom_format_exc, test/dynamo/test_logging.py::LoggingTests::test_ddp_graphs, test/dynamo/test_logging.py::LoggingTests::test_default_logging, test/dynamo/test_logging.py::LoggingTests::test_distributed_rank_logging, test/dynamo/test_logging.py::LoggingTests::test_dump_compile_times, test/dynamo/test_logging.py::LoggingTests::test_dynamo_debug, test/dynamo/test_logging.py::LoggingTests::test_dynamo_debug_default_off_artifacts, test/dynamo/test_logging.py::LoggingTests::test_dynamo_error, test/dynamo/test_logging.py::LoggingTests::test_dynamo_info, test/dynamo/test_logging.py::LoggingTests::test_fusion, test/dynamo/test_logging.py::LoggingTests::test_graph_breaks, test/dynamo/test_logging.py::LoggingTests::test_graph_region_expansion, test/dynamo/test_logging.py::LoggingTests::test_guards_polyfill_sloc, test/dynamo/test_logging.py::LoggingTests::test_guards_recompiles, test/dynamo/test_logging.py::LoggingTests::test_guards_sloc, test/dynamo/test_logging.py::LoggingTests::test_guards_sloc_vr, test/dynamo/test_logging.py::LoggingTests::test_hierarchical_compile, test/dynamo/test_logging.py::LoggingTests::test_inductor_debug, test/dynamo/test_logging.py::LoggingTests::test_inductor_error, test/dynamo/test_logging.py::LoggingTests::test_inductor_info, test/dynamo/test_logging.py::LoggingTests::test_invalid_artifact_flag, test/dynamo/test_logging.py::LoggingTests::test_invalid_artifact_flag_error_msg, test/dynamo/test_logging.py::LoggingTests::test_kernel_code, test/dynamo/test_logging.py::LoggingTests::test_log_traced_frames, test/dynamo/test_logging.py::LoggingTests::test_logs_out, test/dynamo/test_logging.py::LoggingTests::test_multiline_format, test/dynamo/test_logging.py::LoggingTests::test_open_registration, test/dynamo/test_logging.py::LoggingTests::test_open_registration_python_api, test/dynamo/test_logging.py::LoggingTests::test_open_registration_with_registered_parent, test/dynamo/test_logging.py::LoggingTests::test_optimizer_non_static_param, test/dynamo/test_logging.py::LoggingTests::test_output_code, test/dynamo/test_logging.py::LoggingTests::test_recompiles, test/dynamo/test_logging.py::LoggingTests::test_schedule, test/dynamo/test_logging.py::LoggingTests::test_trace_call, test/dynamo/test_logging.py::LoggingTests::test_trace_call_graph_break, test/dynamo/test_logging.py::LoggingTests::test_trace_call_inline_call, test/dynamo/test_logging.py::LoggingTests::test_trace_call_prefix, test/dynamo/test_logging.py::LoggingTests::test_trace_source_cond, test/dynamo/test_logging.py::LoggingTests::test_trace_source_funcname, test/dynamo/test_logging.py::LoggingTests::test_trace_source_if_stmt, test/dynamo/test_logging.py::LoggingTests::test_trace_source_nested, test/dynamo/test_logging.py::LoggingTests::test_trace_source_simple 2025-09-07T07:34:23.4076322Z 2025-09-07T07:34:23.4076505Z Running dynamo/test_debug_utils 1/1 ... [2025-09-07 07:34:23.405078] 2025-09-07T07:34:23.4076869Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:23.4077773Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_debug_utils.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:23.405441] 2025-09-07T07:34:27.8769299Z 2025-09-07T07:34:27.8770151Z dynamo/test_debug_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_debug_utils_1.1_6d8ef44fb217e87a_.log 2025-09-07T07:34:27.8772592Z Running 4 items in this shard: test/dynamo/test_debug_utils.py::TestDebugUtilsCUDA::test_cast_model_to_fp64_dtype_args_cuda, test/dynamo/test_debug_utils.py::TestDebugUtilsCUDA::test_generate_env_vars_string_cuda, test/dynamo/test_debug_utils.py::TestDebugUtilsDeviceCUDA::test_aot_graph_parser_cuda, test/dynamo/test_debug_utils.py::TestDebugUtilsDeviceCUDA::test_sym_aot_graph_parser_cuda 2025-09-07T07:34:27.8774498Z 2025-09-07T07:34:27.8774706Z Running test_out_dtype_op 1/1 ... [2025-09-07 07:34:27.877068] 2025-09-07T07:34:27.8775150Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:27.8776915Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_out_dtype_op.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:27.877486] 2025-09-07T07:34:31.8980413Z 2025-09-07T07:34:31.8981166Z test_out_dtype_op 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_out_dtype_op_1.1_0c734d5a1a84a783_.log 2025-09-07T07:34:31.8985520Z Running 12 items in this shard: test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_dynamo, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_inductor_decomp, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_inductor_decomp_trace, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_int_mm_default_trace, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_make_fx, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_mm_numerical, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_mul_scalar_numerical, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_no_autograd, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_non_functional, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_non_op_overload, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_op_functional, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_wrong_output 2025-09-07T07:34:31.8989171Z 2025-09-07T07:34:31.8989422Z Running functorch/test_eager_transforms 1/1 ... [2025-09-07 07:34:31.898130] 2025-09-07T07:34:31.8989870Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:31.8991246Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_eager_transforms.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:31.898469] 2025-09-07T07:34:37.8717999Z 2025-09-07T07:34:37.8718975Z functorch/test_eager_transforms 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_eager_transforms_1.1_1658b5bb9967e1fe_.log 2025-09-07T07:34:37.8849874Z Running 355 items in this shard: test/functorch/test_eager_transforms.py::TestSliceArgnums::test_argnums_reorders, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_duplicate_argnums, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_flat_args_with_negative_int_argnum, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_flat_args_with_positive_int_argnum, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_flat_args_with_tuple_argnum, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_invalid_argnum_type, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_not_enough_argnums, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_out_of_bounds_argnum_values, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_pytree_args, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_buffer_tying, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_combine_state_for_ensemble_error, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_combine_state_for_ensemble_smoke, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_correctness_mnist_mechanism_functional_call, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_correctness_mnist_mechanism_make_functional, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_disable_autograd_tracking_disable_autograd_tracking_False, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_disable_autograd_tracking_disable_autograd_tracking_True, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_make_functional_state_correctly_returned_after_forward_mechanism_functional_call, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_make_functional_state_correctly_returned_after_forward_mechanism_make_functional, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_parameter_tying, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_parameter_tying_ensemble, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_parameter_tying_grad, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_stack_module_state_error, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_stack_module_state_leaf, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_stack_module_state_mismatch_error, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_stack_module_state_smoke, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_using_detach_functional_call_detach_params_False, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_using_detach_functional_call_detach_params_True, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_with_buffers_disable_autograd_tracking_disable_autograd_tracking_False, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_with_buffers_disable_autograd_tracking_disable_autograd_tracking_True, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_advanced_indexing_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_argnums_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_composed_with_autograd_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_composite_complicated_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_composite_simple_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_composite_two_ops_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_conj_bit_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_dtype_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_escaped_wrappers_are_ignored_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_escaped_wrappers_are_marked_as_dead_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_fn_with_kwargs_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_functional_init_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_functional_init_with_buffers_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_grad_aux_pytree_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_grad_aux_tensor_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_grad_of_vjp_composition_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_grad_of_vjp_of_grad_composition_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_grad_pytree_inputs_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_inplace_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_inplace_on_captures_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_inplace_on_view_base_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_inplace_on_view_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_invalid_argnums_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_is_cuda_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_manual_seed_inside_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_negative_argnums_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_nesting_simple_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_inside_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_mixed_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_nested_complicated_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_nested_simple_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_outside_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_outside_vjp_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_outside_vjp_fn_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_outside_vjp_only_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_value_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_numel_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_out_of_order_argnums_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_primitive_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_print_captured_tensor_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_shape_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_ctor_inside_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_print_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_print_grad_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_print_vmap_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_print_vmap_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_print_vmap_vmap_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_unrelated_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_unrelated_hessian_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_unrelated_vjp_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_unrelated_vjp_multiple_inputs_outputs_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_view_inplace_simple_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_views_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_aux_pytree_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_aux_tensor_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_of_grad_composition_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_outputs_can_any_pytree_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_pytree_error_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_pytree_input_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_pytree_output_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_two_outputs_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_zero_grad_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_log_softmax_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_new_empty_materializes_tensor_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_new_zeros_materializes_tensor_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_per_sample_grads_embeddingnet_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_per_sample_grads_embeddingnet_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_per_sample_grads_inplace_view_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_per_sample_grads_simple_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_correctness_different_devices_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_correctness_different_devices_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_default_arg_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_default_arg_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_multi_input_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_multi_input_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_multi_input_multi_output_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_multi_input_multi_output_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_simple_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_simple_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_unrelated_outputs_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_unrelated_outputs_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_zero_dim_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_zero_dim_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_defaults_to_zero_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_defaults_to_zero_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_effect_on_return_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_effect_on_return_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_tuple_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_tuple_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_aux_pytree_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_aux_pytree_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_aux_tensor_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_aux_tensor_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev__preallocate_and_copy_False_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev__preallocate_and_copy_True_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev_chunksize_one__preallocate_and_copy_False_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev_chunksize_one__preallocate_and_copy_True_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev_composition__preallocate_and_copy_False_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev_composition__preallocate_and_copy_True_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_complex_error_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_diff_numel_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_diff_numel_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_dimensionality_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_dimensionality_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_empty_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_empty_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_empty_output_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_empty_output_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_float_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_float_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_hessian_simple_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_inplace_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_inplace_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_jac_with_non_tensor_args_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_jac_with_non_tensor_args_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_args_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_args_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_outputs_pytree_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_outputs_pytree_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_outputs_pytree_multidim_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_outputs_pytree_multidim_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_pytree_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_pytree_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_multiple_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_multiple_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_pytree_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_pytree_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_single_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_single_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_negative_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_negative_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_nested_jac_simple_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_nested_jac_simple_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_out_of_bounds_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_out_of_bounds_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_outputs_can_any_pytree_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_outputs_can_any_pytree_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_repeated_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_repeated_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_simple_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_simple_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_simple_not_flat_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_simple_not_flat_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_take_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_take_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_unrelated_input_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_unrelated_input_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_unrelated_output_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_unrelated_output_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_vmap_on_jac_simple_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_vmap_on_jac_simple_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_autograd_function_disables_fwd_grad_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_aux_pytree_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_aux_tensor_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_disable_fwd_grad_inside_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_disable_fwd_grad_mixed_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_disable_fwd_grad_outside_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_inplace_on_captures_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_inputs_are_tuples_of_tensors_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_jvp_inside_autograd_function_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_jvp_new_tensor_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_multiple_inputs_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_multiple_inputs_outputs_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_multiple_outputs_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_nonempty_primals_and_tangents_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_outputs_can_any_pytree_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_primals_tangents_length_mismatch_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_pytree_inputs_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_pytree_inputs_error_cases_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_simple_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_strict_mode_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_unrelated_input_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_unrelated_output_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_zerotensor_vmapjvp_interaction_cuda, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_basic_cuda_float32, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_composition_grad_cuda_float32, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_composition_vmap_cuda_float32, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_errors_cuda, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_nested_input_nested_output_cuda_float32, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_return_cuda_float32, test/functorch/test_eager_transforms.py::TestVmapJvpInplaceViewCUDA::test_all_dual_base_inplace_cuda, test/functorch/test_eager_transforms.py::TestVmapJvpInplaceViewCUDA::test_all_dual_base_view_inplace_cuda, test/functorch/test_eager_transforms.py::TestVmapJvpInplaceViewCUDA::test_all_dual_no_view_cuda, test/functorch/test_eager_transforms.py::TestVmapJvpInplaceViewCUDA::test_right_dual_base_prop_cuda, test/functorch/test_eager_transforms.py::TestVmapJvpInplaceViewCUDA::test_right_dual_view_prop_cuda, test/functorch/test_eager_transforms.py::TestHessianCUDA::test_hessian_vectorize_correctness_multi_input_cuda, test/functorch/test_eager_transforms.py::TestHessianCUDA::test_hessian_vectorize_correctness_simple_cuda, test/functorch/test_eager_transforms.py::TestHessianCUDA::test_hessian_vectorize_correctness_unrelated_outputs_cuda, test/functorch/test_eager_transforms.py::TestHessianCUDA::test_jacfwd_different_levels_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_functionalize_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_grad_and_value_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_hessian_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_jacrev_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_vmap_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_functional_jacfwd_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_functional_jacrev_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_functional_jvp_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_functional_vjp_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_can_use_functionalize_when_key_is_excluded_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_can_use_grad_when_key_is_excluded_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_can_use_vmap_when_key_is_excluded_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_functionalize_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_grad_and_value_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_hessian_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_jacrev_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_vmap_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_grad_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_grad_vjp_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_grad_vmap_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_jvp_supports_saved_tensor_hooks_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_make_fx_jacrev_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_make_fx_vjp_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_make_fx_vmap_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_no_warning_on_import_functorch_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_requires_grad_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_retain_grad_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_transforms_dont_support_saved_tensor_hooks_transform_grad_and_value_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_transforms_dont_support_saved_tensor_hooks_transform_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_transforms_dont_support_saved_tensor_hooks_transform_hessian_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_transforms_dont_support_saved_tensor_hooks_transform_jacrev_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vjp_doesnt_support_saved_tensor_hooks_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vjp_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vjp_vjp_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vjp_vmap_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vmap_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vmap_vjp_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vmap_vmap_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_ensemble_regression_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_ensemble_regression_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_AlphaDropout_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_AlphaDropout_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_Dropout_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_Dropout_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_FeatureAlphaDropout_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_FeatureAlphaDropout_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_lennard_jones_batched_jac_jac_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_lennard_jones_batched_jac_jac_jacrev_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_maml_omniglot_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_maml_omniglot_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_maml_regression_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_maml_regression_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_resnet18_per_sample_grads_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_resnet18_per_sample_grads_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_update_batch_norm_mechanism_functional_call_originally_track_running_stats_False_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_update_batch_norm_mechanism_functional_call_originally_track_running_stats_True_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_update_batch_norm_mechanism_make_functional_originally_track_running_stats_False_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_update_batch_norm_mechanism_make_functional_originally_track_running_stats_True_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_basic_sum_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_functional_call_multiple_dicts_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_grad_grad_sum_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_grad_name_wrapping_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_grad_sum_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_no_grad_inside_grad_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_no_grad_outside_grad_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_vmap_grad_sum_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_vmap_sum_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fake_tensors_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fx_multi_out_op_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fx_out_op_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fx_reapply_views_simple_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fx_simple_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fx_transpose_simple_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_grad_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_nonfunctional_output_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_opt_tensor_list_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_optional_tensorlist1_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_optional_tensorlist2_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_inplace_view_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_linear_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_multioutput_inplace_slice_view_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_multioutput_view_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_resize_program_inputs_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_simple_view_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_vmap_functionalize_jvp_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_input_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_input_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_neither_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_neither_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_output_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_output_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_input_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_input_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_neither_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_neither_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_output_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_output_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_input_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_input_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_neither_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_neither_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_output_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_output_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_input_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_input_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_neither_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_neither_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_output_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_output_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_grad_fn_name_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_needs_input_grads_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_once_differentiable_autograd_vjp_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_once_differentiable_grad_vjp_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_set_materialize_grads_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_has_vmap_staticmethod_and_has_generate_vmap_rule_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_in_dims_multiple_inputs_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_in_dims_single_input_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_incompatible_out_dims_error_msg_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_info_object_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_kwarg_only_tensors_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_no_vmap_staticmethod_and_no_generate_vmap_rule_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_none_returns_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_should_have_two_returns_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_skips_empty_layer_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_CtxWithSavedTensors_error_if_name_collision_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_CtxWithSavedTensors_nesting_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_CtxWithSavedTensors_overrides_saved_tensors_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_CtxWithSavedTensors_passthrough_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_debug_unwrap_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_reductify_leaf_cuda, test/functorch/test_eager_transforms.py::TestCompileTransformsCUDA::test_compile_vmap_hessian_cuda, test/functorch/test_eager_transforms.py::TestCompileTransformsCUDA::test_grad_deprecated_api_cuda 2025-09-07T07:34:37.8971530Z 2025-09-07T07:34:37.8971795Z Running export/test_hop 1/1 ... [2025-09-07 07:34:37.872568] 2025-09-07T07:34:37.8972149Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:37.8973042Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_hop.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:37.872920] 2025-09-07T07:34:42.7950512Z 2025-09-07T07:34:42.7951248Z export/test_hop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_hop_1.1_3f96666ee6a59504_.log 2025-09-07T07:34:42.7968204Z Running 40 items in this shard: test/export/test_hop.py::TestHOPCUDA::test_aot_export_auto_functionalize_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_aot_export_cond_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_aot_export_flex_attention_backward_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_aot_export_flex_attention_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_aot_export_invoke_quant_packed_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_aot_export_invoke_quant_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_aot_export_invoke_subgraph_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_aot_export_scan_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_aot_export_while_loop_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_aot_export_while_loop_stack_output_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_pre_dispatch_export_auto_functionalize_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_pre_dispatch_export_cond_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_pre_dispatch_export_flex_attention_backward_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_pre_dispatch_export_flex_attention_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_pre_dispatch_export_invoke_quant_packed_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_pre_dispatch_export_invoke_quant_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_pre_dispatch_export_invoke_subgraph_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_pre_dispatch_export_scan_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_pre_dispatch_export_while_loop_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_pre_dispatch_export_while_loop_stack_output_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_retrace_export_auto_functionalize_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_retrace_export_cond_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_retrace_export_flex_attention_backward_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_retrace_export_flex_attention_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_retrace_export_invoke_quant_packed_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_retrace_export_invoke_quant_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_retrace_export_invoke_subgraph_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_retrace_export_scan_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_retrace_export_while_loop_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_retrace_export_while_loop_stack_output_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_serialize_export_auto_functionalize_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_serialize_export_cond_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_serialize_export_flex_attention_backward_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_serialize_export_flex_attention_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_serialize_export_invoke_quant_packed_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_serialize_export_invoke_quant_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_serialize_export_invoke_subgraph_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_serialize_export_scan_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_serialize_export_while_loop_simple_cuda_float32, test/export/test_hop.py::TestHOPCUDA::test_serialize_export_while_loop_stack_output_simple_cuda_float32 2025-09-07T07:34:42.7980428Z 2025-09-07T07:34:42.7980641Z Running profiler/test_cpp_thread 1/1 ... [2025-09-07 07:34:42.795314] 2025-09-07T07:34:42.7981027Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:42.7981959Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_cpp_thread.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:42.795718] 2025-09-07T07:34:48.0679508Z 2025-09-07T07:34:48.0680341Z profiler/test_cpp_thread 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_cpp_thread_1.1_2968a035a017d289_.log 2025-09-07T07:34:48.0683864Z Running 6 items in this shard: test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_profile_memory_cuda, test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_with_enable_profiler_in_child_thread_cuda, test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_without_enable_profiler_in_child_thread_cuda, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_profile_memory_xpu, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_with_enable_profiler_in_child_thread_xpu, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_without_enable_profiler_in_child_thread_xpu 2025-09-07T07:34:48.0687684Z 2025-09-07T07:34:48.0687944Z Running dynamo/test_aot_autograd_cache 1/1 ... [2025-09-07 07:34:48.068242] 2025-09-07T07:34:48.0688427Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:48.0689590Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_aot_autograd_cache.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:48.068669] 2025-09-07T07:34:55.6444360Z 2025-09-07T07:34:55.6445185Z dynamo/test_aot_autograd_cache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_aot_autograd_cache_1.1_310faae5fcaa1b4c_.log 2025-09-07T07:34:55.6485798Z Running 102 items in this shard: test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_aot_runtime_trace_joint, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_autograd_function, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_autograd_guard_single_entry_device_cuda_bfloat16, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_autograd_guard_single_entry_device_cuda_float16, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_autograd_inductor_guards_device_cuda_bfloat16_requires_grad_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_autograd_inductor_guards_device_cuda_bfloat16_requires_grad_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_autograd_inductor_guards_device_cuda_float16_requires_grad_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_autograd_inductor_guards_device_cuda_float16_requires_grad_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_autograd_lazy_backward, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_autograd_no_dynamo_trace_backward, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_basic, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_cache_hot_load_device_cpu_bfloat16_dynamic_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_cache_hot_load_device_cpu_bfloat16_dynamic_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_cache_hot_load_device_cpu_float32_dynamic_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_cache_hot_load_device_cpu_float32_dynamic_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_cache_hot_load_device_cuda_bfloat16_dynamic_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_cache_hot_load_device_cuda_bfloat16_dynamic_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_cache_hot_load_device_cuda_float32_dynamic_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_cache_hot_load_device_cuda_float32_dynamic_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_cache_lazy_backward_for_compiled_autograd, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_clear_fx_graph_cache, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_compiled_autograd_bypass, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_constant_tensor_device_guards, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_custom_autograd_function, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_custom_autograd_function_miss, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_custom_autograd_function_with_custom_triton_kernel, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_custom_autograd_function_with_custom_triton_kernel_cache_invalidation, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_dynamic_shapes_different_sizes, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_fx_graph_cache_off, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_inference_graph_cache_hit_with_compiled_autograd_enabled, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_invoke_subgraph, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_multi_graph_specialization, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_multiple_compile_triton_kernels, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_nn_module_with_params_global_constant, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_non_bundled_to_bundled_config_change, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_saved_tensors_hooks_autograd_cache, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_saved_tensors_hooks_autograd_cache_symbolic, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_symbol_specialization, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_triton_op_cache_invalidation, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_triton_op_cache_multiple_ops_invalidation, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_unsafe_mark_cacheable_fn_select_allow_in_graph, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_unsafe_mark_cacheable_fn_select_tag_activation_checkpoint, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_view_replay_bypass, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheTests::test_vmap, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_aot_runtime_trace_joint, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_autograd_function, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_autograd_guard_single_entry_device_cuda_bfloat16, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_autograd_guard_single_entry_device_cuda_float16, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_autograd_inductor_guards_device_cuda_bfloat16_requires_grad_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_autograd_inductor_guards_device_cuda_bfloat16_requires_grad_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_autograd_inductor_guards_device_cuda_float16_requires_grad_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_autograd_inductor_guards_device_cuda_float16_requires_grad_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_autograd_lazy_backward, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_autograd_no_dynamo_trace_backward, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_basic, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_cache_hot_load_device_cpu_bfloat16_dynamic_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_cache_hot_load_device_cpu_bfloat16_dynamic_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_cache_hot_load_device_cpu_float32_dynamic_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_cache_hot_load_device_cpu_float32_dynamic_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_cache_hot_load_device_cuda_bfloat16_dynamic_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_cache_hot_load_device_cuda_bfloat16_dynamic_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_cache_hot_load_device_cuda_float32_dynamic_False, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_cache_hot_load_device_cuda_float32_dynamic_True, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_cache_lazy_backward_for_compiled_autograd, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_clear_fx_graph_cache, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_compiled_autograd_bypass, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_constant_tensor_device_guards, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_custom_autograd_function, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_custom_autograd_function_miss, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_custom_autograd_function_with_custom_triton_kernel, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_custom_autograd_function_with_custom_triton_kernel_cache_invalidation, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_dynamic_shapes_different_sizes, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_fx_graph_cache_off, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_inference_graph_cache_hit_with_compiled_autograd_enabled, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_invoke_subgraph, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_multi_graph_specialization, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_multiple_compile_triton_kernels, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_nn_module_with_params_global_constant, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_non_bundled_to_bundled_config_change, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_saved_tensors_hooks_autograd_cache, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_saved_tensors_hooks_autograd_cache_symbolic, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_symbol_specialization, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_triton_op_cache_invalidation, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_triton_op_cache_multiple_ops_invalidation, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_unsafe_mark_cacheable_fn_select_allow_in_graph, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_unsafe_mark_cacheable_fn_select_tag_activation_checkpoint, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_view_replay_bypass, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCacheBundledTests::test_vmap, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_basic_hash_key, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_different_configs, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_different_global_configs, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_different_graphs, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_different_inputs, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_freezing, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_identical_graphs_and_configs, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_incompatible_function, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_nn_module_with_params, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_normal_torch_function, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_private_builtin, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_private_namespace, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_safe_torchfunction, test/dynamo/test_aot_autograd_cache.py::AOTAutogradCachePicklerTests::test_sanitize_gm_for_cache 2025-09-07T07:34:55.6522361Z 2025-09-07T07:34:55.6522581Z Running inductor/test_auto_functionalize 1/1 ... [2025-09-07 07:34:55.644757] 2025-09-07T07:34:55.6523127Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:55.6524183Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_auto_functionalize.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:55.645114] 2025-09-07T07:34:56.8010951Z 2025-09-07T07:34:56.8011723Z test_ops_jit 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_jit_1.1_08e9dea06f9b102d_.log 2025-09-07T07:34:56.8346871Z Running 1139 items in this shard: test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_abs_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_acos_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_acosh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_asin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_asinh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_atan2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_atan_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_atanh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_cat_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_clamp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_digamma_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_div_floor_rounding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_div_no_rounding_mode_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_div_trunc_rounding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_erf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_erfc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_erfinv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_exp2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_expm1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_ge_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_gt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_i0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_igamma_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_igammac_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_le_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_lgamma_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_linalg_det_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_linalg_householder_product_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_linalg_inv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_linalg_matrix_power_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_log1p_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_log_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_log_softmax_with_dtype_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_logit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_logsumexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_lt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_mH_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_matmul_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_matrix_exp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_max_binary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_min_binary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_movedim_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_mul_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_ne_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_neg_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_conv1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_conv2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_conv3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_conv_transpose1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_conv_transpose2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_conv_transpose3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_group_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_layer_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_rms_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_outer_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_round_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_round_decimals_0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_round_decimals_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_round_decimals_neg_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_sigmoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_sinc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_softmax_with_dtype_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_sub_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_tanh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_transpose_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_trunc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_vstack_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_xlogy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_H_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_H_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_T_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_T_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___getitem___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___getitem___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___radd___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___radd___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rdiv___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rdiv___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rmatmul___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rmatmul___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rmod___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rmul___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rmul___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rpow___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rpow___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rsub___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rsub___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__batch_norm_with_update_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__chunk_cat_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__chunk_cat_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__native_batch_norm_legit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__segment_reduce_lengths_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__segment_reduce_offsets_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__softmax_backward_data_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__unsafe_masked_index_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__unsafe_masked_index_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__upsample_bilinear2d_aa_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_abs_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_abs_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_acos_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_acos_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_acosh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_acosh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_add_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_add_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addbmm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addbmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addcdiv_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addcdiv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addcmul_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addcmul_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmm_decomposed_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmm_decomposed_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmv_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addr_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_alias_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_alias_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_all_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_all_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_allclose_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_allclose_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_amax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_amin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_aminmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_angle_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_angle_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_any_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_any_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_arange_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_argmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_argmin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_argsort_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_argwhere_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_argwhere_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_partial_views_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_partial_views_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_scatter_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_asin_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_asin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_asinh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_asinh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atan2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atan_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atan_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atanh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atanh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_1d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_2d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_3d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_baddbmm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_baddbmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bernoulli_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bfloat16_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bfloat16_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_block_diag_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_block_diag_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bmm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bool_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bool_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_broadcast_shapes_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_broadcast_tensors_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_broadcast_tensors_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_broadcast_to_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_broadcast_to_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bucketize_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_byte_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_byte_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cartesian_prod_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cartesian_prod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cat_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cat_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cauchy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cdist_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cdouble_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cdouble_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ceil_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cfloat_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cfloat_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chalf_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chalf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_char_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_char_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_inverse_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_inverse_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chunk_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chunk_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_clamp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_clamp_max_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_clamp_min_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_clone_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_clone_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_column_stack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_column_stack_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_combinations_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_combinations_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_complex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_conj_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_conj_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_conj_physical_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_conj_physical_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_constant_pad_nd_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_constant_pad_nd_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_contiguous_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_contiguous_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_copysign_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_corrcoef_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_corrcoef_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cos_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cos_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cosh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cosh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_count_nonzero_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_count_nonzero_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cov_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cov_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cross_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cross_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cummax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cummin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumprod_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumprod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumsum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumsum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumulative_trapezoid_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumulative_trapezoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_deg2rad_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diag_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diag_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diag_embed_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diag_embed_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagflat_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagflat_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_scatter_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diff_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diff_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_digamma_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dist_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dist_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_div_floor_rounding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_div_no_rounding_mode_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_div_no_rounding_mode_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_div_trunc_rounding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dot_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dot_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_double_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_double_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dsplit_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dsplit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dstack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dstack_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_einsum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_einsum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_like_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_permuted_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_permuted_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_strided_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_strided_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_eq_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_eq_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_equal_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_equal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_erf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_erfc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_erfinv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_exp2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_exp2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_exp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_exp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_as_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_as_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expm1_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expm1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_exponential_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_eye_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_eye_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fft2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fft2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fftn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fftshift_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fftshift_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_hfft2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_hfft2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_hfft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_hfft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_hfftn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_hfftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifft2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifft2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifftn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifftshift_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifftshift_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ihfft2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ihfft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ihfftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfft2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfft2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfftn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_rfft2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_rfft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_rfftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fill_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fill_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flatten_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flatten_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flip_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flip_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fliplr_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fliplr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flipud_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flipud_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_float_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_float_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_float_power_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_float_power_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_floor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_floor_divide_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fmin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fmod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_frac_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_frexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_full_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_full_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_full_like_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_full_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_gather_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_gather_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ge_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_geometric_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_geqrf_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_geqrf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_gradient_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_gradient_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_grid_sampler_2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_grid_sampler_3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_gt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_half_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_half_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hash_tensor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_heaviside_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_histc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hsplit_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hsplit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hstack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hstack_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hypot_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_i0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_igamma_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_igammac_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_imag_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_add_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_add_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_fill_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_fill_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_put_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_put_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_reduce_amax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_reduce_amin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_reduce_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_reduce_prod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_select_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_select_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_inner_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_inner_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_int_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_int_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isclose_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isclose_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isfinite_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isfinite_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isinf_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isinf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isnan_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isnan_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isneginf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isposinf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isreal_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isreal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_istft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_item_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_item_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_2inputs_2outputs_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_binary_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_binary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_binary_return_by_ref_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_unary_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_unary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_kron_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_kron_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_kthvalue_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ldexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ldexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_le_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lerp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lerp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lgamma_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cholesky_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cholesky_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cholesky_ex_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cholesky_ex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cond_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cond_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cross_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cross_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_det_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_det_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_diagonal_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_diagonal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eig_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eig_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigvals_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigvals_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigvalsh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigvalsh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_householder_product_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_householder_product_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_inv_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_inv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_inv_ex_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_inv_ex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_factor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_factor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_factor_ex_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_factor_ex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lstsq_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lstsq_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_ex_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_ex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_norm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_power_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_power_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_rank_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_rank_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_multi_dot_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_multi_dot_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_norm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_hermitian_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_hermitian_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_singular_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_singular_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_qr_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_qr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_slogdet_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_slogdet_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_ex_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_ex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_triangular_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_triangular_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_svd_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_svd_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_svdvals_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_svdvals_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_tensorinv_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_tensorinv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_tensorsolve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_tensorsolve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vander_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vander_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vecdot_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vecdot_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vector_norm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vector_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_tensor_overload_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_tensor_overload_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log10_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log10_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log1p_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log1p_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log_normal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log_softmax_with_dtype_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log_softmax_with_dtype_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logaddexp2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logaddexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logcumsumexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logcumsumexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logdet_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logdet_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_and_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_and_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_not_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_not_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_or_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_or_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_xor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_xor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logspace_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logspace_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logspace_tensor_overload_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logspace_tensor_overload_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logsumexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logsumexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_long_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_long_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_unpack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_unpack_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mH_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mH_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mT_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mT_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_amax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_amin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_argmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_argmin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_cumprod_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_cumprod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_cumsum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_cumsum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_fill_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_fill_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_log_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_logaddexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_logsumexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_logsumexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_mean_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_median_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_normalize_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_normalize_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_prod_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_prod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_scatter_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_select_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_select_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_softmin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_std_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_std_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_sum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_sum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_var_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_var_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matmul_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matmul_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matrix_exp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matrix_exp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_max_binary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_max_pool2d_with_indices_backward_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_max_reduction_no_dim_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_max_reduction_with_dim_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_maximum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mean_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_median_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_meshgrid_list_of_tensors_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_meshgrid_list_of_tensors_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_meshgrid_variadic_tensors_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_meshgrid_variadic_tensors_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_min_binary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_min_reduction_no_dim_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_min_reduction_with_dim_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_minimum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mode_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_movedim_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_movedim_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_msort_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mul_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mul_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_multinomial_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mv_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nan_to_num_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nanmean_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nanmean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nanmedian_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nanquantile_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nansum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nansum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_narrow_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_narrow_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_narrow_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_narrow_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_native_batch_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_native_dropout_backward_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_native_layer_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ne_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ne_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_neg_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_neg_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_empty_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_empty_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_empty_strided_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_empty_strided_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_full_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_full_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_ones_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_ones_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_zeros_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_zeros_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nextafter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_alpha_dropout_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_avg_pool1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_avg_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_avg_pool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_batch_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_bilinear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_celu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_channel_shuffle_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_channel_shuffle_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv1d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv2d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv3d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_cosine_similarity_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_cross_entropy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_ctc_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_dropout2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_dropout3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_dropout_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_elu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_embedding_bag_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_embedding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_gelu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_glu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_grid_sample_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_group_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_hardshrink_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_hardsigmoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_hardswish_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_hardtanh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_huber_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_instance_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_area_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_linear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_nearest_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_kl_div_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_l1_loss_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_l1_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_layer_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_leaky_relu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_linear_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_linear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_local_response_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_logsigmoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_pool1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_pool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_mish_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_mse_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multi_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_nll_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_normalize_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_normalize_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_circular_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_circular_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_constant_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_constant_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_reflect_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_reflect_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_replicate_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_replicate_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_replicate_negative_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pairwise_distance_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pairwise_distance_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pdist_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pixel_shuffle_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_prelu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_relu6_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_relu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_rms_norm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_rms_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_rrelu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_selu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_silu_complex_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_silu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_soft_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_softmin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_softplus_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_softshrink_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_softsign_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_softsign_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_tanhshrink_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_tanhshrink_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_threshold_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_unfold_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_unfold_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_upsample_bilinear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_upsample_nearest_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nonzero_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nonzero_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nonzero_static_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nonzero_static_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_fro_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_fro_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_inf_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_inf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_nuc_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_nuc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_normal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_normal_in_place_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_normal_in_place_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_normal_number_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ones_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ones_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ones_like_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ones_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ormqr_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ormqr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_outer_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_outer_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pca_lowrank_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pca_lowrank_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_permute_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_permute_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_permute_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_permute_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pinverse_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pinverse_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_polar_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_polygamma_polygamma_n_0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_polygamma_polygamma_n_1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_polygamma_polygamma_n_2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_polygamma_polygamma_n_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_polygamma_polygamma_n_4_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_positive_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_positive_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pow_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pow_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_prod_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_prod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_put_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_put_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_qr_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_qr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_quantile_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rad2deg_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rand_like_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rand_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_randint_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_randint_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_randn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_randn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_randn_like_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_randn_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ravel_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ravel_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_real_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_real_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reciprocal_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reciprocal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_remainder_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_renorm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_renorm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_interleave_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_interleave_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reshape_as_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reshape_as_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reshape_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reshape_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resize__cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resize__cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resize_as__cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resize_as__cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resolve_conj_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resolve_conj_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resolve_neg_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resolve_neg_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_roll_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_roll_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rot90_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rot90_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_round_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_round_decimals_0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_round_decimals_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_round_decimals_neg_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rsqrt_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rsqrt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rsub_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rsub_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scalar_tensor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scalar_tensor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_add_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_add_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_reduce_amax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_reduce_amin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_reduce_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_reduce_prod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_reduce_sum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_searchsorted_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_select_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_select_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_select_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sgn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sgn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_short_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_short_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sigmoid_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sigmoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sign_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_bartlett_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_blackman_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_cosine_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_exponential_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_gaussian_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_general_cosine_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_general_hamming_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_hamming_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_hann_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_kaiser_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_nuttall_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signbit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sin_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sinc_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sinc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sinh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sinh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_slice_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_slice_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_slice_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_softmax_with_dtype_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_softmax_with_dtype_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sort_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sparse_mm_reduce_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sparse_sampled_addmm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sparse_sampled_addmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_airy_ai_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_bessel_j0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_bessel_j1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_bessel_y0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_bessel_y1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_chebyshev_polynomial_t_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_chebyshev_polynomial_u_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_chebyshev_polynomial_v_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_chebyshev_polynomial_w_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_entr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_erfcx_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_hermite_polynomial_h_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_hermite_polynomial_he_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_i0e_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_i1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_i1e_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_laguerre_polynomial_l_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_legendre_polynomial_p_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_log_ndtr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_modified_bessel_i0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_modified_bessel_i1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_modified_bessel_k0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_modified_bessel_k1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_ndtr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_ndtri_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_spherical_bessel_j0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_xlog1py_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_zeta_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_list_args_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_list_args_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_with_sizes_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_with_sizes_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_with_sizes_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_with_sizes_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sqrt_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sqrt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_square_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_square_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_squeeze_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_squeeze_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_squeeze_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_squeeze_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_squeeze_multiple_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_squeeze_multiple_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_stack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_stack_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_mean_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_mean_unbiased_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_mean_unbiased_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_unbiased_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_unbiased_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_stft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_stft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sub_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sub_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sum_to_size_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sum_to_size_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_svd_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_svd_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_svd_lowrank_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_svd_lowrank_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_t_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_t_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_t_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_t_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_take_along_dim_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_take_along_dim_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_take_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_take_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tan_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tan_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tanh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tanh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tensor_split_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tensor_split_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tensordot_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tensordot_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tile_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tile_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_to_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_to_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_to_sparse_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_to_sparse_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_topk_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trace_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trace_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_transpose_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_transpose_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_transpose_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_transpose_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trapezoid_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trapezoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trapz_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trapz_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_triangular_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_triangular_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tril_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tril_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_triu_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_triu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_true_divide_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_true_divide_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trunc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unbind_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unbind_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unbind_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unbind_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unflatten_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unflatten_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unfold_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unfold_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unfold_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unfold_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_uniform_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_uniform_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unique_consecutive_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unique_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsafe_chunk_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsafe_chunk_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsafe_split_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsafe_split_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsqueeze_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsqueeze_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsqueeze_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsqueeze_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_mean_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_mean_unbiased_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_mean_unbiased_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_unbiased_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_unbiased_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_vdot_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_vdot_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_complex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_real_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_vsplit_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_vsplit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_vstack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_vstack_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_where_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_where_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_xlogy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_zero__cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_zero__cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_zeros_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_zeros_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_zeros_like_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_zeros_like_cuda_float32 2025-09-07T07:34:56.8662814Z 2025-09-07T07:34:56.8663049Z Running torch_np/test_function_base 1/1 ... [2025-09-07 07:34:56.802981] 2025-09-07T07:34:56.8663539Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:56.8664666Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_function_base.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:56.803348] 2025-09-07T07:34:59.8157992Z 2025-09-07T07:34:59.8159066Z inductor/test_auto_functionalize 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_auto_functionalize_1.1_86b96b6d6276fda8_.log 2025-09-07T07:34:59.8175224Z Running 39 items in this shard: test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias2_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias_id_input_to_custom_op, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias_id_output, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_can_with_default, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_can_with_none_return, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra1, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra3, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra4, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra5, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_on_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_optional_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_optional_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_self_as_mutate_arg, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_tensorlist, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_with_returns_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_with_returns_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_can_auto_functionalize, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic2_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic3_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_graph_input_is_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode1_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode2_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode3_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode4_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_recompile, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_scheduling_with_multiple_mutates, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_slice, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_slice_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_split, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_split_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_try_use_slice, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_unbacked_auto_functionalize_op 2025-09-07T07:34:59.8188407Z 2025-09-07T07:34:59.8188679Z Running dynamo/test_activation_checkpointing 1/1 ... [2025-09-07 07:34:59.815988] 2025-09-07T07:34:59.8189317Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:59.8190393Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_activation_checkpointing.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:59.816365] 2025-09-07T07:35:00.5230115Z 2025-09-07T07:35:00.5231012Z torch_np/test_function_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_function_base_1.1_bac849290df60584_.log 2025-09-07T07:35:00.5232400Z Running 1 items in this shard: test/torch_np/test_function_base.py::TestAppend::test_basic 2025-09-07T07:35:00.5232910Z 2025-09-07T07:35:00.5233734Z Running cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic 1/1 ... [2025-09-07 07:35:00.523101] 2025-09-07T07:35:00.5234633Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:35:00.5236859Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:35:00.523427] 2025-09-07T07:35:07.4420512Z 2025-09-07T07:35:07.4421555Z dynamo/test_activation_checkpointing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_activation_checkpointing_1.1_d0ae425cb2869baa_.log 2025-09-07T07:35:07.4440933Z Running 32 items in this shard: test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_autocast_flash_attention_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_custom_rule_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_inplace_op_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_invalid_context_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_list_ops_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_must_not_recompute_gemm_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_must_recompute_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_outplace_op_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_parametrization_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_partial_ctx_fn_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_random_op_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_tensor_subclass_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_triton_kernel_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_distributed_utils_checkpoint_wrapper_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_dynamo_does_not_trace_getattr_as_top_frame_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_error_msg_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_fallback_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_kwargs_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_list_inputs_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_pattern_matcher_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_symints_location_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_tags_decomps_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_tags_dropout_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_tags_function_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_tags_function_via_global_checkpoint_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_tags_function_with_kwargs_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_tags_module_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_tags_multiple_checkpoints_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_tags_must_save_tensor_that_has_backward_hook_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_tags_rand_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_tags_recomputed_rand_cuda, test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_tags_sequential_layers_cuda 2025-09-07T07:35:07.4456312Z 2025-09-07T07:35:07.4456549Z Running dynamo/test_aot_autograd 1/1 ... [2025-09-07 07:35:07.442188] 2025-09-07T07:35:07.4457143Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:35:07.4458351Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_aot_autograd.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:35:07.442528] 2025-09-07T07:35:07.7474530Z 2025-09-07T07:35:07.7475725Z cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp_extensions.libtorch_agnostic_extension.test.test_libtorch_agnostic_1.1_2a03413186dec434_.log 2025-09-07T07:35:07.7490231Z Running 24 items in this shard: test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_default_constructor_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_device_guard_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_device_guard_set_index_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_divide_neg_exp_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_exp_neg_is_leaf_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_fill_infinity_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_get_current_device_index_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_identity_does_not_hog_memory_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_is_contiguous_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_abs_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_amax_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_amax_vec_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_empty_like_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_is_cpu_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_narrow_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_new_empty_dtype_variant_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_new_zeros_dtype_variant_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_ones_like_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_pad_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_transpose_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_my_zero__cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_neg_exp_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_slow_sgd_cuda, test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py::TestLibtorchAgnosticCUDA::test_stream_cuda 2025-09-07T07:35:07.7501495Z 2025-09-07T07:35:07.7501910Z Running dynamo/test_graph_deduplication 1/1 ... [2025-09-07 07:35:07.747696] 2025-09-07T07:35:07.7502517Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:35:07.7503697Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_graph_deduplication.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:35:07.748040] 2025-09-07T07:35:11.7135317Z 2025-09-07T07:35:11.7136568Z dynamo/test_aot_autograd 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_aot_autograd_1.1_f005c21370a269c1_.log 2025-09-07T07:35:11.7157170Z Running 48 items in this shard: test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_LSTM, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_alias_inputs, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_autograd_expand_mutation_backwards, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_autograd_expand_mutation_error, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_autograd_expand_mutation_functionalizes, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_autograd_raises_invalid_leaf_set, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_export_joint_simple_repro, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_grad_mode_mutation, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_sequence_nr, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_args, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_args_param, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_args_param_non_tensor_arg, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_args_param_non_tensor_arg_list, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_with_global, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_autograd_function_tangent_mutation, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_call_fn_with_non_const_inputs_aot_safe, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_call_fn_with_non_const_inputs_aot_unsafe, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_call_fn_with_non_const_inputs_aot_unsafe_control_flow, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_data_ptr_access_copy, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_data_ptr_access_fails_in_backward, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_data_ptr_access_fails_in_forward, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_different_inputs_overlapping_set_with_mutation, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_donated_buffer1, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_donated_buffer2, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_donated_buffer3, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_donated_buffer4, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_donated_buffer5, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_donated_buffer6, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_donated_buffer_with_retain_or_create_graph1, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_donated_buffer_with_retain_or_create_graph2, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_donated_buffer_with_retain_or_create_graph3, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_donated_buffer_with_retain_or_create_graph4, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_double_backward_errors, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_eager_sequence_nr, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_grad_inputs_alias_inputs, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_inputs_overlapping_with_mutation_recompile, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_inputs_overlapping_with_mutation_stress, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_multiple_aot_autograd_calls_dupe_args, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_mutation, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_mutation1, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_negative_testing, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_negative_testing_mutation, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_nn_parameter_construction, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_no_storage_overlap_guards_no_aliasing, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_no_storage_overlap_guards_no_mutation, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_requires_grad_fake_via_dynamo_recompiles, test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_split_with_sizes_aot_autograd_cleans_up_traceback_meta 2025-09-07T07:35:11.7173268Z 2025-09-07T07:35:11.7173639Z Running test_model_exports_to_core_aten 1/1 ... [2025-09-07 07:35:11.713670] 2025-09-07T07:35:11.7174449Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:35:11.7175637Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_model_exports_to_core_aten.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:35:11.714040] 2025-09-07T07:35:11.7180196Z 2025-09-07T07:35:11.7180789Z dynamo/test_graph_deduplication 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_graph_deduplication_1.1_81872f735b2dbc91_.log 2025-09-07T07:35:11.7187721Z Running 18 items in this shard: test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_autocast_ordering, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_cycle_detection_arg_and_additional_deps, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_cycle_detection_complex, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_cycle_detection_no_cycle, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_cycle_detection_simple, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_cycle_detection_single_node, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_cycle_detection_two_node, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_dependent_subgraphs, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_input_aliasing, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_input_mutation, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_multiple_subgraphs, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_mutation_ordering, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_output_nodes_last, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_param_transfer_to_submodule, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_single_subgraph, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_single_subgraph2, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_tuple_inputs, test/dynamo/test_graph_deduplication.py::GraphDededuplicationTests::test_tuple_return 2025-09-07T07:35:11.7193885Z 2025-09-07T07:35:11.7194130Z Running test_itt 1/1 ... [2025-09-07 07:35:11.718183] 2025-09-07T07:35:11.7194573Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:35:11.7195660Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_itt.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:35:11.718494] 2025-09-07T07:35:15.3879141Z 2025-09-07T07:35:15.3879844Z test_itt 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_itt_1.1_ba65fe12449f4ae5_.log 2025-09-07T07:35:15.3880807Z Running 1 items in this shard: test/test_itt.py::TestItt::test_itt 2025-09-07T07:35:15.3881462Z 2025-09-07T07:35:15.3881814Z Running test_modules 1/3 ... [2025-09-07 07:35:15.387961] 2025-09-07T07:35:15.3882409Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:35:15.3885050Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_modules.py', '-m', 'not serial', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:35:15.388284] 2025-09-07T07:35:15.7843422Z 2025-09-07T07:35:15.7844213Z test_model_exports_to_core_aten 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_model_exports_to_core_aten_1.1_c9a50f53d817f574_.log 2025-09-07T07:35:15.7846005Z Running 1 items in this shard: test/test_model_exports_to_core_aten.py::TestQuantizePT2EModels::test_vit_aten_export 2025-09-07T07:35:15.7846633Z 2025-09-07T07:35:15.7847392Z Running test_modules 3/3 ... [2025-09-07 07:35:15.784600] 2025-09-07T07:35:15.7847915Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:35:15.7852384Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_modules.py', '-m', 'not serial', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:35:15.785042] 2025-09-07T07:36:35.0554034Z 2025-09-07T07:36:35.0555487Z inductor/test_torchinductor_opinfo 4/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_4.12_729e2453cdba6369_.log 2025-09-07T07:36:35.0672998Z Running 299 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_max_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dist_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfinv_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfinv_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_inv_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_qr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vecdot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vecdot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vecdot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logcumsumexp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_multinomial_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanquantile_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_bilinear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_glu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_kl_div_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_normalize_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_silu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_silu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_nuc_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsub_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__flash_attention_forward_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triangular_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_uint16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vdot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_int64 2025-09-07T07:36:35.0797393Z 2025-09-07T07:36:35.0797644Z Running inductor/test_mps_basic 1/1 ... [2025-09-07 07:36:35.056031] 2025-09-07T07:36:35.0798131Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:36:35.0799166Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_mps_basic.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:36:35.056373] 2025-09-07T07:36:42.6311295Z 2025-09-07T07:36:42.6312283Z inductor/test_mps_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_mps_basic_1.1_b060ad5c5fc3cfab_.log 2025-09-07T07:36:42.6313505Z 2025-09-07T07:36:42.6314098Z Running test_decomp 2/22 ... [2025-09-07 07:36:42.631238] 2025-09-07T07:36:42.6314668Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:36:42.6318414Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=2', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:36:42.631593] 2025-09-07T07:37:15.9636703Z 2025-09-07T07:37:15.9637630Z test_modules 3/3 was successful, full logs can be found in artifacts with path test/test-reports/test_modules_3.3_b24b95280ca060a9_.log 2025-09-07T07:37:16.0009702Z Running 1240 items in this shard: test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_LeakyReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose3d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LeakyReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiheadAttention_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReplicationPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_FractionalMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiLabelSoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReflectionPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_LSTM_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LSTM_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReplicationPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Bilinear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose3d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_forward_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_forward_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LeakyReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CrossEntropyLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_GLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_InstanceNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GRU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GRU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_InstanceNorm1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LSTM_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiLabelSoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SmoothL1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_TransformerEncoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_TransformerEncoder_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Bilinear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_FractionalMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GRU_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LeakyReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LocalResponseNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_SmoothL1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Bilinear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_FractionalMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiLabelSoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_AvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CrossEntropyLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GRU_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTM_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_SmoothL1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_SoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose3d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CrossEntropyLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRU_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LocalResponseNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveAvgPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveMaxPool1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveMaxPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveMaxPool2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AvgPool1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AvgPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BCELoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BCEWithLogitsLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BCEWithLogitsLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm1d_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm1d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm3d_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm3d_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm3d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CELU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CTCLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CircularPad1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CircularPad2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CircularPad3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConstantPad2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Conv2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Conv3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConvTranspose1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConvTranspose3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CosineEmbeddingLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CosineEmbeddingLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Embedding_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_FractionalMaxPool2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GRU_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GRU_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_HingeEmbeddingLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm1d_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm1d_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm1d_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm2d_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm2d_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm3d_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm3d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_KLDivLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LPPool1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LPPool3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LPPool3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LSTMCell_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LSTM_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LSTM_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LayerNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Linear_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LocalResponseNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LogSoftmax_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MSELoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MaxPool3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MaxPool3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Mish_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiLabelMarginLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiLabelSoftMarginLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiheadAttention_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiheadAttention_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_PReLU_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_PoissonNLLLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_RMSNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_RNNCell_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_RNN_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReflectionPad3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReplicationPad2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReplicationPad3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SELU_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SELU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SiLU_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Sigmoid_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SmoothL1Loss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softmax2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softmax_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softmin_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softplus_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softplus_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softsign_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Tanhshrink_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Tanhshrink_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerDecoderLayer_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerEncoderLayer_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerEncoder_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerEncoder_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Transformer_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ZeroPad1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ZeroPad2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ZeroPad3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCELoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCEWithLogitsLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCEWithLogitsLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCEWithLogitsLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm1d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm1d_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm1d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Bilinear_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Bilinear_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CELU_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CTCLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CosineEmbeddingLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CosineEmbeddingLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CosineEmbeddingLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CrossEntropyLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CrossEntropyLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ELU_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Embedding_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Embedding_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_FractionalMaxPool3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRUCell_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GaussianNLLLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GroupNorm_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GroupNorm_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GroupNorm_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardswish_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardtanh_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HingeEmbeddingLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HingeEmbeddingLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HuberLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm2d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm2d_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm2d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm2d_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm3d_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_KLDivLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_L1Loss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LPPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LPPool3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTMCell_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Linear_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Linear_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Linear_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LocalResponseNorm_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LocalResponseNorm_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LocalResponseNorm_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSigmoid_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSigmoid_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSoftmax_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MSELoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MarginRankingLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MarginRankingLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Mish_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Mish_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Mish_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiLabelMarginLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiLabelMarginLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiLabelSoftMarginLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiLabelSoftMarginLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiheadAttention_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiheadAttention_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiheadAttention_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_NLLLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PReLU_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PReLU_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PoissonNLLLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PoissonNLLLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PoissonNLLLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RMSNorm_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RMSNorm_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNNCell_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNNCell_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU6_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU6_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU6_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Sigmoid_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SmoothL1Loss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SmoothL1Loss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SoftMarginLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SoftMarginLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmax2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmax_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmin_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softshrink_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softshrink_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softshrink_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softsign_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softsign_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softsign_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanh_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanhshrink_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Threshold_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Threshold_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerDecoderLayer_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoderLayer_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Transformer_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Transformer_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad2d_swap_True_set_grad_True_cuda_float32 2025-09-07T07:37:16.0357544Z 2025-09-07T07:37:16.0357719Z Running test_decomp 3/22 ... [2025-09-07 07:37:15.965467] 2025-09-07T07:37:16.0358299Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:37:16.0359284Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=3', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:37:15.965839] 2025-09-07T07:39:38.2479743Z 2025-09-07T07:39:38.2480713Z functorch/test_ops 2/3 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_2.3_f3ed1211a6d50f50_.log 2025-09-07T07:39:38.3564895Z Running 3471 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_cross_entropy_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_l1_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_log_softmax_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_mse_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amax_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amin_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmax_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmax_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ceil_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_clamp_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ge_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_maximum_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_sort_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_tensor_with_scalar_list_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_T_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_diagonal_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_diagonal_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_hsplit_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mH_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_movedim_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_movedim_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_real_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_real_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_reshape_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_resolve_conj_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_resolve_neg_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_resolve_neg_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_select_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_special_grad_op_jvp_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_special_grad_op_vjp_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_multiple_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_transpose_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_transpose_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unflatten_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unflatten_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_as_complex_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_as_complex_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_MulGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyCubeAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyCubeNotComposableAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyMulAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyTakeAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ScaleGradGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_T_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__chunk_cat_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__native_batch_norm_legit_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addbmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_alias_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_amin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_aminmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_any_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_arange_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_baddbmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bfloat16_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bucketize_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdist_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdouble_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cfloat_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chalf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_inverse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chunk_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_max_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clone_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_contiguous_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_copysign_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_count_nonzero_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cov_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumprod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagflat_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diff_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_digamma_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dist_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_equal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfinv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eye_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftshift_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flatten_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fliplr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flipud_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ge_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geometric_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_select_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isclose_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isfinite_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isreal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_item_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_unary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_kron_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_kthvalue_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ldexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lerp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvalsh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_factor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_power_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_slogdet_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svd_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svdvals_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_tensorsolve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vander_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vector_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_tensor_overload_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log1p_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_normal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logdet_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_and_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logit_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_unpack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mT_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_amin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_fill_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_logaddexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_sum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_no_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_with_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_maximum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_median_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_list_of_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_no_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mode_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_multinomial_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_5_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_dropout_backward_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_neg_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_zeros_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_avg_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_avg_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_avg_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_channel_shuffle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_no_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_embedding_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_similarity_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_ctc_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gaussian_nll_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardswish_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_nearest_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_l1_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_leaky_relu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_linear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_logsigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mish_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_constant_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_replicate_negative_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pdist_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_prelu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_relu6_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_selu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softsign_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_threshold_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_triplet_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_static_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_fro_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_in_place_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_number_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ormqr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pca_lowrank_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rand_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reciprocal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_remainder_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_repeat_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_repeat_interleave_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rot90_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsqrt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_add_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_select_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_bartlett_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_cosine_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_gaussian_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_hamming_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hamming_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signbit_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sort_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_sampled_addmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_w_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_entr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_laguerre_polynomial_l_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_ndtr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_ndtri_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_polygamma_special_polygamma_n_0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_w_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_zeta_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_list_args_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sub_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_to_size_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_lowrank_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensordot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_torch_ops_aten__safe_softmax_default_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trace_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triangular_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trunc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unbind_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unflatten_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_uniform_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vdot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_where_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_xlogy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvmapjvp_linalg_solve_cuda 2025-09-07T07:39:38.4606638Z 2025-09-07T07:39:38.4606791Z Running test_decomp 6/22 ... [2025-09-07 07:39:38.253120] 2025-09-07T07:39:38.4607136Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:39:38.4608020Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=6', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:39:38.253439] 2025-09-07T07:40:10.5287431Z 2025-09-07T07:40:10.5288309Z test_decomp 2/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_2.22_81f8ff7b321e3943_.log 2025-09-07T07:40:10.5387397Z Running 381 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmatmul___cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcdiv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_left_shift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dist_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gcd_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gcd_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geqrf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_igamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_imag_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cond_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cond_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_normalize_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_multinomial_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardshrink_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_number_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_qr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_logit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_sgn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_split_with_sizes_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_take_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_frexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_gcd_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_glu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_norm_inf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_norm_nuc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_vdot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_uint8 2025-09-07T07:40:10.5479938Z 2025-09-07T07:40:10.5480084Z Running test_decomp 7/22 ... [2025-09-07 07:40:10.529333] 2025-09-07T07:40:10.5480406Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:40:10.5481302Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=7', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:40:10.529741] 2025-09-07T07:42:17.5030107Z 2025-09-07T07:42:17.5031029Z test_modules 1/3 was successful, full logs can be found in artifacts with path test/test-reports/test_modules_1.3_2665f3b1c9de03bb_.log 2025-09-07T07:42:17.5385190Z Running 1194 items in this shard: test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveAvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_FractionalMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GRU_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LocalResponseNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_SmoothL1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softsign_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoderLayer_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveAvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Bilinear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GRU_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LeakyReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LocalResponseNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiheadAttention_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_SoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_GRU_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_LSTM_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Bilinear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_FractionalMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LSTM_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LocalResponseNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiLabelSoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softsign_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_TransformerEncoderLayer_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveAvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_FractionalMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRU_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_forward_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTM_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_grad_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_CrossEntropyLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_GLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_InstanceNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ReflectionPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiheadAttention_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Softsign_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Bilinear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CrossEntropyLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_InstanceNorm1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_InstanceNorm3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_InstanceNorm3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LSTM_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LocalResponseNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiheadAttention_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiheadAttention_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_RNN_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_RNN_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReplicationPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softsign_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CrossEntropyLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CrossEntropyLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GRU_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LSTM_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReplicationPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softsign_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoderLayer_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose3d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LSTM_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiheadAttention_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoderLayer_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveAvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose3d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CrossEntropyLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CrossEntropyLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LSTM_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoderLayer_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_repr_nn_CrossEntropyLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_FractionalMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_repr_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTM_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReplicationPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerEncoderLayer_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CrossEntropyLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SmoothL1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveAvgPool1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveAvgPool1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveAvgPool2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveAvgPool3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveAvgPool3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveMaxPool1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveMaxPool3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AvgPool2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AvgPool3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BCELoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm1d_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm2d_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm2d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm3d_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Bilinear_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CELU_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CTCLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CircularPad2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConstantPad2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConstantPad3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Conv3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConvTranspose2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConvTranspose2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CrossEntropyLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Embedding_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_FractionalMaxPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_FractionalMaxPool3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GRUCell_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GroupNorm_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GroupNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Hardshrink_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Hardshrink_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Hardswish_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Hardswish_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Hardtanh_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_HuberLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_HuberLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm2d_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm2d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm3d_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm3d_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_L1Loss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_L1Loss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LSTM_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LeakyReLU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Linear_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LocalResponseNorm_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LogSigmoid_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LogSigmoid_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LogSoftmax_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MSELoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MarginRankingLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MaxPool1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MaxPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiLabelMarginLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiLabelSoftMarginLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiheadAttention_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_NLLLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_NLLLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_PReLU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_PoissonNLLLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_RMSNorm_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReLU6_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReLU6_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReLU_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReflectionPad1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReflectionPad2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReflectionPad2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReplicationPad1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReplicationPad2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReplicationPad3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SoftMarginLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softshrink_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Tanh_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Threshold_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerEncoderLayer_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerEncoderLayer_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerEncoder_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCELoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCELoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm1d_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm1d_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CELU_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CELU_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CTCLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CTCLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CTCLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CrossEntropyLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ELU_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Embedding_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_FractionalMaxPool3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_FractionalMaxPool3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GELU_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GELU_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GELU_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GLU_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GLU_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRUCell_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRUCell_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRUCell_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GaussianNLLLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GroupNorm_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardshrink_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardshrink_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardswish_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardtanh_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardtanh_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HingeEmbeddingLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HingeEmbeddingLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HuberLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HuberLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HuberLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm2d_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm2d_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm2d_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm3d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm3d_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm3d_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_KLDivLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_L1Loss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_L1Loss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LPPool1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LPPool2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LPPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTMCell_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTMCell_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LayerNorm_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LayerNorm_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LayerNorm_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LayerNorm_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LeakyReLU_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Linear_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LocalResponseNorm_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSigmoid_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSigmoid_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MSELoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MSELoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MarginRankingLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MarginRankingLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Mish_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiLabelMarginLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiMarginLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiMarginLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiheadAttention_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_NLLLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_NLLLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNNCell_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SELU_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SELU_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SiLU_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SiLU_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SiLU_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Sigmoid_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Sigmoid_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SoftMarginLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SoftMarginLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmax2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmax_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmin_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmin_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softshrink_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanh_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanh_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanhshrink_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanhshrink_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerDecoderLayer_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoderLayer_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoderLayer_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Transformer_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad3d_swap_True_set_grad_True_cuda_float32 2025-09-07T07:42:17.5721165Z 2025-09-07T07:42:17.5721322Z Running test_decomp 10/22 ... [2025-09-07 07:42:17.504741] 2025-09-07T07:42:17.5721646Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:42:17.5722511Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=10', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:42:17.505105] 2025-09-07T07:42:24.9803360Z 2025-09-07T07:42:24.9804102Z test_decomp 10/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_10.22_783c37bdc93fdc90_.log 2025-09-07T07:42:24.9916860Z Running 393 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exponential_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_householder_product_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_batch_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_one_hot_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__batch_norm_with_update_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_sinc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_frac_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_glu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_RNN_eval_mode_cuda_float64 2025-09-07T07:42:25.0013548Z 2025-09-07T07:42:25.0013691Z Running test_decomp 11/22 ... [2025-09-07 07:42:24.980944] 2025-09-07T07:42:25.0014080Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:42:25.0014973Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=11', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:42:24.981323] 2025-09-07T07:43:29.3170324Z 2025-09-07T07:43:29.3171298Z test_decomp 6/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_6.22_0bec487bcc812e54_.log 2025-09-07T07:43:29.3285334Z Running 441 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmatmul___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___ror___cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive__native_batch_norm_legit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__softmax_backward_data_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_or_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dist_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exponential_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_igammac_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_singular_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_elu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_unfold_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_number_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_searchsorted_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_bartlett_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtri_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_masked_fill_cuda, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_or_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_expand_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_masked_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nan_to_num_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_std_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float8_e4m3fn, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float8_e5m2, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_igamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_igamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardsigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardswish_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_huber_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mish_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softplus_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_norm_inf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_bool 2025-09-07T07:43:29.3393777Z 2025-09-07T07:43:29.3393935Z Running test_decomp 14/22 ... [2025-09-07 07:43:29.317618] 2025-09-07T07:43:29.3394259Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:43:29.3395129Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=14', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:43:29.317968] 2025-09-07T07:44:25.0579430Z 2025-09-07T07:44:25.0580227Z test_decomp 11/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_11.22_9984725ddfe2356d_.log 2025-09-07T07:44:25.0688651Z Running 409 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rand___cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_or_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gcd_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_det_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_det_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eig_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svdvals_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vector_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_normal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logdet_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_huber_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pdist_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_nuc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_number_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_gaussian_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtri_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_lowrank_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_index_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_lerp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nansum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_max_unpool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_std_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardswish_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mish_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_round_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_complex128, test/test_decomp.py::DecompOneOffTestsCUDA::test_contiguous_log_softmax_cuda, test/test_decomp.py::DecompOneOffTestsCUDA::test_sdpa_nn_functional_scaled_dot_product_attention_cuda_float64 2025-09-07T07:44:25.0788886Z 2025-09-07T07:44:25.0789034Z Running test_decomp 15/22 ... [2025-09-07 07:44:25.058602] 2025-09-07T07:44:25.0789361Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:44:25.0790238Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=15', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:44:25.058965] 2025-09-07T07:44:31.1401991Z 2025-09-07T07:44:31.1403673Z torch_np/numpy_tests/core/test_multiarray 1/2 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_multiarray_1.2_6546f321c532c400_.log 2025-09-07T07:44:31.1541573Z Running 434 items in this shard: test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_readonly_flag_protocols_flag__warn_on_write_flag_value_True_writeable_False, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_readonly_flag_protocols_flag_writeable_flag_value_False_writeable_False, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_readonly_flag_protocols_flag_writeable_flag_value_True_writeable_True, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_string_align, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_warnonwrite, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_any_base, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_from_buffer, test/torch_np/numpy_tests/core/test_multiarray.py::TestHash::test_int, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_attributes, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill_readonly, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_set_stridesattr, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_asanyarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_ascontiguousarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_cont, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_false, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_false_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_true, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_asanyarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_ascontiguousarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_asfortranarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_assignment_broadcasting, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_assignment_errors, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_cast_to_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_longdouble_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_stringlike_empty_list, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_invalid_newaxis, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_newaxis, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_array_too_big, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_empty_unicode, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_false_len_iterable, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_false_len_sequence, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_from_attribute, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_from_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype0_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_(2,3)O_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,(3)O_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,(3)O_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,O_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,O_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_ragged_ndim_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_ragged_shape_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_sequence_non_homogeneous, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_structured_void_promotion_arr, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_structured_void_promotion_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_too_big_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_big, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_cast_from_bytes, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_cast_from_void, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_count_nonzero_all, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_sum, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_test_interning, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_any_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_integer, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argsort_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argsort_complex, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_func0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_func1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_conjugate_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_copy, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_diagonal_view_notwriteable, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_dot, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_dot_out_mem_overlap, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_flatten, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_fuzz, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_integer, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_iterative, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_e, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_prod, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_ravel, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_complex, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_floats_f32, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype0_part_imag, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype0_part_real, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype1_part_imag, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_nans, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_degraded, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype6, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_squeeze, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_assign_mask, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_list, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_mask, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmax_np_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmin_np_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_output_shape_method_argmax, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmax, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmin, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmax, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data13, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data15, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data16, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data21, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data25, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data26, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data29, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data3, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data30, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data34, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data39, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data42, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data44, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data45, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data46, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data52, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data56, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data58, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data8, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_maximum_signed_integers, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data11, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data15, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data16, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data17, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data26, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data27, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data28, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data3, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data30, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data32, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data33, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data34, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data35, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data37, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data40, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data41, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data43, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data45, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data46, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data48, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data5, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data50, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data51, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data52, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data57, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data7, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data8, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data9, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_minimum_signed_integers, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinMax::test_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestNewaxis::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestClip::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestClip::test_nan, test/torch_np/numpy_tests/core/test_multiarray.py::TestCompress::test_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestCompress::test_flatten, test/torch_np/numpy_tests/core/test_multiarray.py::TestCompress::test_truncate, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_byteorder_greater_True, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_overlaps, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_clip, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ip_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_out_overlap, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ret_is_out_shape0, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ret_is_out_shape1, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype3, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype4, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype5, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_datetime, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_mixed, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_ascii, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_counted_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_dtype, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_dtype_bool, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_empty_files_text, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_file_position_after_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_file_position_after_tofile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromfile_bad_dup, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromfile_offset, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_inf, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_io_open_buffered_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_largish_file, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_load_object_array_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_malformed, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_nofile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_numbers, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_read_shorter_than_count_subarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_binary_str, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_dump_pathlib, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_repr, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_string_with_ws, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_tofile_cleanup, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_tofile_format, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_unseekable_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_little_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_mmap_close, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_empty_view, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_freeform_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_int_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_none_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_zeros_appended, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_ddof, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_dtype_from_dtype, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_dtype_from_input, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_keepdims, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_float16, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_python_type, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_axis_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_complex_byteorder, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_dimensions, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_vdot_array_order, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_vdot_uncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_all, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_2args, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_3args, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_3args_errors, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotcolumnvect2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotmatmat, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotmatvec, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotmatvec2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecscalar2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecvecinner, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecvecouter, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_huge_vectordot_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_huge_vectordot_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmN3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mv11, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN5, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN7, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN8, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_empty_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_exceptions, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matmul_bool, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_out_contiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_result_types_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_scalar_output, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_shapes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_vector_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_vector_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_exceptions, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_axes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_inplace, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_inplace_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_raises, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_result_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_scalar_output, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_shapes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_vector_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_3d_tensor, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_inner_scalar_and_vector, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_broadcast1, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_docstring_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_docstring_3, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops0, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops3, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_axis_spec, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_usigned_shortshort, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_byteorder_inside_struct, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_char_vs_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_intra_padding, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_native_padding, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_native_padding_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_native_padding_3, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_unnamed_fields, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_array_interfaces, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_scalars, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_striding_not_ok, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestDelMisc::test_flat_element_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_to_int_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_dtype_mix, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_empty_result, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_ndim, test/torch_np/numpy_tests/core/test_multiarray.py::TestHashing::test_arrays_not_hashable, test/torch_np/numpy_tests/core/test_multiarray.py::TestHashing::test_collections_hashable, test/torch_np/numpy_tests/core/test_multiarray.py::TestFormat::test_0d, test/torch_np/numpy_tests/core/test_multiarray.py::TestFormat::test_1d_format, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_dot_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_put_noncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_explicit_dtype_dt1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_explicit_dtype_dt2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_infinite, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_nan_step, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_zero_step, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_1023, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_151, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_16, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_2047, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_24, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_32, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_383, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_48, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_64 2025-09-07T07:44:31.1675625Z 2025-09-07T07:44:31.1675772Z Running test_decomp 18/22 ... [2025-09-07 07:44:31.140759] 2025-09-07T07:44:31.1676104Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:44:31.1677000Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=18', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:44:31.141123] 2025-09-07T07:45:20.7946465Z 2025-09-07T07:45:20.7947406Z test_decomp 7/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_7.22_a7fa38495038066e_.log 2025-09-07T07:45:20.8037359Z Running 345 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___ror___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___ror___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rxor___cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rxor___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_left_shift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_or_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_einsum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gcd_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvalsh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_normalize_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_normalize_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_batch_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nextafter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardswish_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softshrink_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_unfold_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_qr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_0_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_nuttall_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_clamp_min_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_squeeze_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_t_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_unfold_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_log_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_mv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_rrelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softplus_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_norm_nuc_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_polar_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_triu_indices_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_vdot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_RNN_train_mode_cuda_float64, test/test_decomp.py::DecompOneOffTestsCUDA::test_rms_norm_decomp_cuda_cuda 2025-09-07T07:45:20.8122006Z 2025-09-07T07:45:20.8122248Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T07:45:20.8122716Z Running test_decomp 19/22 ... [2025-09-07 07:45:20.795042] 2025-09-07T07:45:20.8123053Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:45:20.8123335Z Uploading artifacts took 0.00 seconds 2025-09-07T07:45:20.8124199Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=19', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:45:20.795377] 2025-09-07T07:46:03.2998632Z 2025-09-07T07:46:03.2999428Z dynamo/test_dynamic_shapes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_dynamic_shapes_1.1_daf1a9a480b54283_.log 2025-09-07T07:46:03.3713021Z Running 1916 items in this shard: test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_arguments_binding_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_cpu_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_cpu_graph_break_2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_cpu_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_cpu_graph_break_inner_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_decorator_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_device_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_float64_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_graph_break_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_sdpa_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autograd_profiler_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autograd_profiler_enabled_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_context_wrapping_grad_mode_decorator_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_context_wrapping_grad_mode_nested_function_decorator_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_context_wrapping_set_grad_enabled_nested_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_amp_autocast_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_device_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_event_across_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_event_created_outside_of_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_event_method_create_stream_outside_of_compile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_event_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_event_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_stream_across_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_stream_compared_with_constant_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_stream_compared_with_stream_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_stream_context_manager1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_stream_context_manager2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_stream_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_disable_saved_tensors_hooks_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_disable_saved_tensors_hooks_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_disable_saved_tensors_hooks_prev_disabled_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_disable_saved_tensors_hooks_prev_disabled_nested_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_generic_context_manager_CustomizedCtxManager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_generic_context_manager_customized_ctx_manager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_generic_context_manager_with_graph_break_CustomizedCtxManager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_generic_context_manager_with_graph_break_customized_ctx_manager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_generic_ctx_manager_with_graph_break_CustomizedCtxManagerWithGraphBreak_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_generic_ctx_manager_with_graph_break_customized_ctx_manager_with_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_grad_mode_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_graph_break_inlining_autocast_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_graph_break_inlining_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_inactive_context_graph_break_local_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_inactive_context_graph_break_local_nullctx2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_inactive_context_graph_break_local_nullctx_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_inactive_context_graph_break_stack2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_inactive_context_graph_break_stack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_is_autocast_cpu_enabled_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_nested_generic_context_manager_CustomizedCtxManager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_nested_generic_context_manager_customized_ctx_manager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_nested_generic_context_manager_with_graph_break_CustomizedCtxManager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_nested_generic_context_manager_with_graph_break_customized_ctx_manager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_nested_grad_mode_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_no_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_return_context_manager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_return_context_manager_with_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_sdpa_kernel_ctx_manager1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_sdpa_kernel_ctx_manager2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_sdpa_kernel_ctx_manager3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_sdpa_kernel_ctx_manager_as_decorator_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_sdpa_kernel_ctx_manager_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_sdpa_kernel_ctx_manager_set_priority_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_torch_profiler_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_torch_profiler_use_after_with_block_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_T_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_add__dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_add_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_addcdiv__dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_addcdiv_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_addcmul__dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_are_functorch_transforms_active_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_attrgetter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_broadcast_foreach_pow_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_build_list_unpack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_call_dict1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_call_dict2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_call_dict3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_call_dict4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_call_dict5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_callable_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_callable_class_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_callable_lambda_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_callable_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_callable_torch_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_chunks1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_class_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_cls_eq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_cls_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_cls_is_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_compare_constant_and_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_complex_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_const_tuple_add1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_const_tuple_add2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_constant1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_constant2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_constant3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_constant4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_constant_set_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_context_wrapping_nested_functions_no_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_cublas_allow_tf32_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_custom_dict_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_default_dict_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_default_dict_constr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_default_dict_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_default_dict_lambda_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_default_dict_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_default_dict_set_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_default_dict_tuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_defaultdict_setdefault1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_defaultdict_setdefault2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_defaultdict_setdefault3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_del_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_deque_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_device_constant_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_device_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_copy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_fromkeys_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_id_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_items_sorted_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_key_set1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_key_set2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_key_set3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_keys_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_mutable_map_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_ops_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_param_keys_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_setdefault1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_setdefault2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_setdefault3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_sorted_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_tuple_lazy_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_update_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_update_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_values_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_distributed_is_available_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_distributed_is_initialized_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dtype_compare_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_elipsis_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_enumerate_custom_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_enumerate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_enumerate_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_filter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_filter_fallback_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_filter_graph_break_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_filter_infinite_iterator_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_filter_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_filter_with_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_finfo_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_flat_param_same_storage_size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_float_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_fn_with_self_set_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_foreach_lerp__dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_fstrings1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_fstrings2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_fstrings3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_fstrings4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_fstrings5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_fstrings6_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_funcdef_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_functools_cache_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_functools_partial_binding_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_functools_partial_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_generic_namedtuple_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_generic_namedtuple_subclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_generic_namedtuple_user_methods_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_get_autocast_gpu_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_get_calculate_correct_fan_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_get_default_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_get_device_properties_tensor_device_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_get_privateuse1_name_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_getattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_getattr_metaclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_globalfn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_globalmodule_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_globalvar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_import1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_in_not_in_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_index_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_indexed_range_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_indirect1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_indirect2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_indirect3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_inline_jit__unwrap_optional_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_inline_jit_annotations_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_inline_lru_cache_fn_with_default_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_inline_script_if_tracing_fn_with_default_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_inline_softmax_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_inline_with_default_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_inner_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_any_autocast_enabled_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_checkpoint_valid_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_complex_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_contiguous_frame_counts_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_contiguous_memory_format_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_floating_point_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_fx_tracing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_in_onnx_export_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_inference_mode_global_recompilation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_inference_recompilation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_integer_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_not_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_not_null_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_quantized_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_sparse_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_isinstance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_islice_chain_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itemgetter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_chain_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_chain_from_iterable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_combinations_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_compress_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_compress_tensors_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_filterfalse_basic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_pairwise_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_permutations_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_permutations_basic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_permutations_various_iterators_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_product_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_product_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_product_various_iterators_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_jit_annotate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_len_constant_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_len_constant_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_len_constant_misc_iterables_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_len_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_add_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_add_then_mutate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_clear_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_compare_polyfill_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_compare_polyfill_non_lists_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_convert_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_expand_lhs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_index_with_constant_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_reversed_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_setitem_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_setitem_slice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_slice_assignment_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_slice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_sorted1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_sorted2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_truth_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_listarg1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_listarg2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_listarg3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_listarg4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_listarg5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_load_global_bool_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_lru_cache_warning_issued_during_tracing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_mT_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_manual_seed_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_call_function_ex_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_deque_extendleft_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_dict_fromkeys_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_enumerate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_infinite_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_list_extend_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_list_slice_assign_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_max_const_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_max_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_partial_unpack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_reduce_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_return_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_set_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_sorted_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_str_join_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_sum_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_tuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_unpack_twice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_unpack_vars_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_with_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_zip_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_math_radians_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_mean_sum_np_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_methodcall1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_methodcall2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_methodcall3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_methodcaller_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_min_max_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_module_constant_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_namedtuple_defaults_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_namedtuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_namedtuple_fields_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_namedtuple_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_namedtuple_replace_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_namedtuple_subclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_namedtuple_user_methods_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_ndarray_builtin_functions_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_ndarray_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_ndarray_methods_returning_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_ndarray_reshape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_ndarray_transpose_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_ndim_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_no_recompile_inner_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_no_recompile_inner_lambda_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_non_inlined_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_not_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_np_constant_collections_as_input_int_or_float_float_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_np_constant_collections_as_input_int_or_float_int_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_np_constant_collections_guards_float_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_np_constant_collections_guards_int_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_np_finfo_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_np_iinfo_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_as_integer_ratio_num_type0_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_as_integer_ratio_num_type3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_bit_length_num_type1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_conjugate_num_type2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_conjugate_num_type4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_hex_num_type5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_is_integer_num_type6_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_numpy_attributes_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_numpy_dtype_argument_to_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_numpy_dtype_call_in_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_numpy_fft_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_numpy_linalg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_numpy_meshgrid_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_numpy_random_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_numpy_size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_obj_eq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_obj_is_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_ordered_dict_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partial_across_graph_break_uninvoked_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_as_input_UDF_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_as_input_partials_lambda_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_as_input_partials_mod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_graph_break_reconstruct_args_and_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_graph_break_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_graph_break_reconstruct_mix_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_graph_break_reconstruct_mix_no_source_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___annotations___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___builtins___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___call___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___class___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___closure___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___code___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___defaults___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___delattr___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___dict___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___dir___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___doc___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___eq___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___format___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___ge___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___get___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___getattribute___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___globals___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___gt___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___hash___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___init___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___init_subclass___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___kwdefaults___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___le___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___lt___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___module___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___name___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___ne___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___new___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___qualname___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___reduce___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___reduce_ex___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___repr___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___setattr___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___sizeof___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___str___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___subclasshook___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr_func_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr_keywords_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_set_attr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_lambda_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_recompilation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_torch_op_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_torch_op_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_udf_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_udf_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_udf_kwarg_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_udf_kwarg_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_pop_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_pos_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_pow_int_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_promote_types_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_rand_inlined_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_rand_tensor_partial_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range_iterator_2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range_iterator_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range_iterator_graph_break_2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range_iterator_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range_length_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range_with_index_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range_with_slice_index_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_reduce_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_reduce_with_initial_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_reduce_with_none_initial_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_reduce_with_single_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_reduce_with_single_with_initial_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_return_dict2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_return_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_return_multiple_numpy_ndarray_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_return_numpy_ndarray_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_return_tuple1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_return_tuple2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_returning_recursive_func_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_round_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_set_add_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_set_in_frozenset_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_set_keys_view_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_set_update_bytecode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_set_update_list_with_duplicated_items_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_shape1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_shape2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_size_tuple_add_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_slice1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_slice2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_slice3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_slice4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_slice5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_slice6_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_slice_eq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sliced_range_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sorted_const_key_non_const_items_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sourceless_build_method_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_startswith_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sum_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sum_shortcut_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sum_shortcut_with_start_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sum_shortcut_with_start_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sum_with_start_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sum_with_start_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_symbool_to_int_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_dim_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_element_size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_is_complex_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_len_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_new_with_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_new_with_size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_size_indexed_by_symint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_type2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_type3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_type4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_type5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tensor_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_to_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_torch_distributions_functions_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_torch_from_numpy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_torch_get_device_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_torch_size_as_dict_key_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_torch_size_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_torch_source_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_transpose_for_scores_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_truth_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tuple1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tuple2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tuple_contains_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tuple_iadd_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tuple_map_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tuple_sorted_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_two_point_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unary_fold_op_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unary_fold_op_seq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unpack1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unpack2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unpack3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unpack_ex1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unpack_ex2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unpack_ex3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unpack_mutable_map_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unsqueeze_inplace_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_viamethod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_viatorch_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_zip_longest_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_zip_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_312_binary_slice_with_graph_break1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_312_binary_slice_with_graph_break2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_RAISE_VARARGS_0_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_T_tensor_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_add_sizes_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_add_to_set_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_anomaly_aot_autograd_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_any_all_symnode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_aot_autograd_propagate_unbacked_symints_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_arange_length_with_float32_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_argwhere_with_dynamic_shapes_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_assert_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_assert_size_stride_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_assigning_function_to_class_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_assigning_function_to_object_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_backend_match_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_backend_match_guard_multi_threads_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_backward_deterministic_mode_mismatch_warning_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_boolarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_bound_shape_checks_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_build_tuple_unpack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builder_for_class_with_metaclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_abs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_bool_on_symbool_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_bool_on_symfloat_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_bool_on_symint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_complex_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_complex_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_isinstance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_str_on_user_defined_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_subclasses_as_method_on_class_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_subclasses_as_method_on_var_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_call_parent_non_class_methods_from_child_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_callpacked_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cannot_trace_mark_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cannot_trace_mark_dynamic_safe_unreached_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cast_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cat_unbacked_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_catch_watchings1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_catch_watchings2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cell_captured_by_existing_func_but_not_root_frame_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cell_output1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cell_output2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_class_binop_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_class_duner_flags_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_class_duner_mro_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_class_has_instancecheck_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_clone_sparse_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_closure_out_of_scope_cell_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_closure_out_of_scope_cell_with_cond_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_closure_out_of_scope_cell_with_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_closure_recompiles_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_closure_with_mutation_and_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_closure_write_across_functions_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_compare_shapes_eq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_compare_shapes_neq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_compare_shapes_tuple_eq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_compare_shapes_tuple_neq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_compare_shapes_with_constant_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_compare_tensor_with_none_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_compilation_metrics_size_limit_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cond_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cond_export_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cond_export_single_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cond_nested_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cond_side_effects_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cond_with_quantization_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_conditional_list_comp_in_context_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_config_getattr_default_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_config_obj_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_const_dict_variable_python_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_constant_getattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cross_entropy_loss_fancy_ctor1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cross_entropy_loss_fancy_ctor2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cross_entropy_loss_simple_ctor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_custom_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_custom_module_free_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_data_access_in_inference_mode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_data_ptr_graph_break_aten_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_data_ptr_graph_break_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dataclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dataclass_fields_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dataclass_local_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_default_args_device_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_default_dtype_change_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_defaultdict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_deque_append_left_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_deque_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_derpy_nn_module_usage_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_descriptor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_descriptor_side_effect_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_deterministic_algorithms_mutated_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dictcomp_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_disable_flag_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dtypes_no_graphbreaks_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dunder_methods_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dunder_new_function_inlining1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dunder_new_function_inlining2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dunder_new_function_inlining3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dunder_new_function_inlining4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dunder_new_function_inlining_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dunder_weakref_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_duplicate_graph_break_log_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamic_one_hot_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamic_shapes_as_strided_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamic_sources_dynamic_override_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamic_sources_dynamic_override_regex_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamic_sources_force_parameter_static_shapes_and_property_static_shapes_override_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamic_sources_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamic_sources_int_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamic_sources_precedence_over_int_specialization_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamic_sources_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamo_cache_invalidate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamo_cache_move_to_front_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamo_compiling_fake_tensor_to_vararg_int_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamo_disabled_in_custom_op_kernels_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamo_min_operator_with_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamo_reset_clears_cache_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_empty_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_enum_as_dict_key_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_enum_as_dict_key_with_overloaded_str_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_enum_guards_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_enum_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_enum_no_graphbreaks_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_enum_subclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_error_on_nested_fx_trace_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_error_on_recompile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_escaping_closure_var_with_backward_hook_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_escaping_closure_var_with_nonlocal_var_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_existing_func_that_creates_capturing_nested_func_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_fail_on_recompile_error_message_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_flat_name_to_original_fqn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_float_speculation_log_divergence_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_fn_hasattr__name__1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_fn_hasattr__name__2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_fn_hasattr__name__3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_fold_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_free_var_and_local_name_collision_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_frozen_dataclass_attr_access_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_frozen_dataclass_default_factory_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_frozen_dataclass_default_value_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_frozen_dataclass_hashable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_frozen_dataclass_kw_only_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_frozen_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_frozenset_of_non_literals_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_frozenset_torch_func_contains_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_fullgraph_capture_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_funcname_cache_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_function_annotation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_function_generic_alias_annotation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_generate_tensor_from_list_of_numpy_primitive_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_generate_trivial_abstract_impl_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_get_attr_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_get_cache_entry_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_get_custom_tensor_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_get_instruction_source_311_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_getattr_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_getattrvariable_as_python_constant_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_getset_descriptor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_global_state_guard_serialization_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_grad_non_none_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_grad_none_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_grad_state_mutated_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_graph_break_compilation_metrics_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_graph_break_compilation_metrics_on_failure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_graph_break_correctly_when_passing_numpy_ndarray_to_torch_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_failure_fn2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_failure_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_failure_fn_shape_control_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_failure_fn_tensor_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_filter_fn_by_id_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_filter_fn_by_is_global_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_filter_fn_by_name_and_value_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_filter_globals_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_filter_inbuilt_nn_modules_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_filter_nn_modules_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_filter_tensors_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_function_builder_with_cse_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_size_oblivious_backed_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_size_oblivious_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_size_oblivious_simplification_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_sym_node_fstring_when_used_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guards_cse_pass_multiple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guards_cse_pass_single_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guards_strip_function_call_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_hasattr_nn_module_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_hash_getitem_slice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_hash_hop_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_id_guarded_class_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_id_guarded_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_id_guarded_object_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_id_of_nn_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_id_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_if_cond_nn_mod1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_if_cond_nn_mod2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_if_cond_nn_mod3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_if_cond_user_defined_object2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_if_cond_user_defined_object3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_if_cond_user_defined_object_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inference_mode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inference_mode_param_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inline_closure_not_loaded_by_parent_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inline_closure_returned_by_another_function_and_captures_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inline_dict_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inline_dict_function_passed_as_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inline_dict_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inline_func_jump_on_tensor_condition_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inline_list_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inline_local_dict_clear_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inline_module_attr_dict_clear_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inline_user_defined_dict_attr_clear_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inplace_desugaring_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inplace_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inplace_param_update_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inplace_view_on_graph_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_input_cell_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inspect_signature_bind_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inspect_signature_bind_non_user_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inspect_signature_parameters_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_int_int_comparisons_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_int_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_int_neg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_int_shape_binops_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_int_shape_comparisons_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_int_shape_inplace_binops_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_intermediary_tensor_grad_access_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_invalid_args_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_is_compiling_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_is_floating_point2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_is_floating_point_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_is_tensor2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_is_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_is_tensor_like2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_is_tensor_like_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_item_changes_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_item_changes_new_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_item_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_iter_set_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_iter_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_iterator_limit_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_accumulate_symint_default_sum_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_accumulate_tensors_builtins_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_accumulate_tensors_default_sum_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_accumulate_tensors_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_accumulate_tensors_user_defined_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_groupby_pure_python_default_identify_func_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_groupby_pure_python_key_func_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_infinite_count_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_infinite_cycle_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_infinite_repeat_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_infinite_repeat_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_islice_default_end_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_islice_default_step_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_islice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_repeat_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_tee_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_large_reduction_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_linear_module_free_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_append_return_none_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_class_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_hasattr1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_hasattr2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_iadd_side_effect_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_iadd_with_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_iterator_contains_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_mul_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_slice_mul_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_listcomp_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_load_fast_and_clear_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_mandelbrot_numpy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_map_side_effects_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_map_with_quantization_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_mark_dynamic_with_ranges_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_mark_static_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_mark_unbacked_strict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_matmul1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_min_max_over_iterable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_module_complex_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_module_deepcopy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_module_not_callable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_mro_type_tensor_no_source_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_multiple_inheritance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_mutable_mapping_multiple_inheritance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_named_parameters_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_namedtuple1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_namedtuple2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_namedtuple3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_namedtuple_class_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_namedtuple_with_custom_getitem_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nan_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_ne_operator_with_custom_eq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_ne_operator_with_custom_graphbreak_eq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_ne_operator_with_custom_ne_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_closure_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_dataclass_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_frozen_dataclass_hashable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_function_resuming_with_correct_globals_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_optimize_decorator_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_optimize_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_optimize_run_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_sequential_try_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_sequential_try_with_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_sequential_try_with_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_sequential_with_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_wraps_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nesteduserfunction_setattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_new_with_int_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_newly_constructed_tensor_attr_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nn_functional_reduction_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nn_module_getattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nn_module_getattribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nn_sequential_invocation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nn_sequential_invocation_reposition_indices_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_no_error_on_nested_fx_trace_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_no_guard_for_unused_sym_node_fstring_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_no_raise_guard_partial_constraint_across_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_no_raise_guard_partial_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_non_pt2_compliant_ops_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_not_dynamic_scope_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numel_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_array_of_arrays_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_as_global_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_fallback_on_eager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_force_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_gt_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_int_constant_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_min_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_ndarray_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_ndarray_graph_break_with_multiple_outputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_ndarray_works_with_builtin_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_no_raise_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_non_torch_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_random_config_to_numpy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_readonly_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_recompilation_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_size_attr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_subdtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_take_along_axis_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_tolist_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_torch_operators_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_ufunc_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_ufunc_out_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_unique_f16_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_variable_isinstance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_with_builtin_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_object_classmethod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_object_setattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_object_staticmethod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_onnx_shape_as_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_optimize_on_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_ordered_dict_alias_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_ordered_dict_move_to_end_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_os_environ_get_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_os_environ_set_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_out_variant_custom_op_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_out_variants_with_resizing_on_graph_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_out_variants_with_resizing_on_graph_inputs_with_dynamic1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_out_variants_with_resizing_on_graph_inputs_with_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_outside_linear_module_free_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_overridden_getattribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_packaging_version_parse_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_pair_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_param_shape_binops_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_parameter_free_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_patched_builtin_functions_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_pep0479_convert_stopiteration_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_precompile_entries_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_precompile_entry_hit_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_precompile_entry_miss_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_precompile_fail_on_recompile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_proxy_frozen_dataclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_pt2_compliant_ops_are_allowed_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_pt2_compliant_overload_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_pure_python_accumulate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_py_guards_mark_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_python_slice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_raise_guard_full_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_raise_guard_indirect_full_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_raise_guard_partial_constraint_across_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_raise_guard_partial_constraint_no_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_raise_on_backend_error_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_raises_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_raises_importerror1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_raises_importerror2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_range_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_range_iter_guards_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_range_iter_side_effects_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_range_with_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_real_imag_tensor_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_recompile_message_on_parameter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_recompile_on_disable_1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_recompile_on_disable_2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_recompile_on_global_state_change_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_reconstruct_frozen_dataclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_reconstruct_set_across_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_recursion_depth_guards_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_recursive_inline_list_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_recursive_tensor_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_release_input_memory_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_release_module_memory_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_release_scope_memory_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_remove_set_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_repeat_interleave_graphbreaks_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_repro_graph_breaks_in__get_item_by_idx_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_restore_graphstate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_return_dict_with_graph_break_and_update_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_return_nested_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_returning_func_with_captured_func_and_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_returning_nested_func_with_captured_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_running_func_with_captured_func_and_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_running_nested_func_with_captured_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_runtime_assert_replacement_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_sample_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_scalar_device_movement_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_scalar_tensor_is_equivalent_to_int_list_argument_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_scalar_tensor_is_equivalent_to_symint_argument_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_scalar_tensor_is_equivalent_to_symint_list_argument_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_sequential_module_free_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_set_aliasing_recompiles_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_set_custom_tensor_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_set_descriptor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_set_discard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_set_update_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_setattr_mutation1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_setattr_mutation2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_setattr_mutation3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_and_tuple_equality_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_equal_constructor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_equal_create_symbolic_sizes_strides_storage_offset_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_equal_empty_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_equal_evaluate_expr_divisible_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_equal_evaluate_expr_refinement_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_equal_evaluate_expr_replacement_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_equal_runtime_assert_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_equal_unbacked_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_no_recording_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_recorded_function_fallback_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_int_comparisons_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_int_inplace_binops_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_unpack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_side_effects_codegen_update_mutated_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_simple_set_usage_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_size_dim_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_size_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_slice_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_source_non_input_grad_access_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_sourceless_namedtuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_storage_return_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_str_format_assert1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_str_format_assert2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_str_format_return1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_str_format_return2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_stride_dim_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_structseq1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_structseq2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_super_after_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_super_calling_with_metaclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_sym_and_terms_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_sym_constrain_range_on_replaced_unbacked_symbol_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_sym_max_unbacked_sizelike_simplification_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_symint_as_device_kwarg_multi_gpu_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_symint_as_device_kwarg_non_strict_export_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_symint_copy_into_unbacked_slice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_symint_fold_nontrivial_product_modulo_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_sys_modules_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tagging_tensors_mix_used_unused_structure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tagging_tensors_simple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_build_list_unpack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_ctor_list_of_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_data_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_dict1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_dict2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_dict3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_dot_grad_no_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_dynamic_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_interacts_with_numpy_ndarray_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_is_contiguous_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_item_capture_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_item_no_capture_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_layout_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_setattr_getset_descriptor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_types_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_thread_local_setattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tolist_0d_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tolist_1d_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tolist_float_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tolist_kd_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tolist_kd_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tolist_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_top_package_import_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_check_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_check_is_size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_check_symbolic_shape_rel_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_compile_ctx_on_forward_and_training_step_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_distributions_lazy_property_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_dtype_python_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_dynamo_codegen_pow_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_generator_set_state_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_guards_stack_frame_register_inlining_deep_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_guards_stack_frame_register_inlining_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_nn_parameter_isinstance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_objects_as_keys_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_package_working_with_trace_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_seed_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_size_numel_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_size_numel_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_variable_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_trace_ndarray_frame_2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_trace_ndarray_frame_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tuple_class_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tuple_from_tuple_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tuple_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tuple_iadd_with_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tuple_mul_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tuple_mul_with_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_type_copy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_typing_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_typing_typevar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_typing_union_and_optional_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_typing_variable_isinstance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unbacked_2d_expand_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unbacked_empty_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unbacked_repeat_cat_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unbacked_sources_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unbacked_sources_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unbacked_strict_mode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unbacked_symint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unhandled_exception_in_dynamo2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unhandled_exception_in_dynamo_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unique_consecutive_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unpack4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unpack5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unpack_tensor_shape_mismatch_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_update_locals_and_stack_uses_shared_cache_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_code_statically_known_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_defined_binop_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_defined_class_name_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_defined_class_python_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_defined_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_defined_object_class_interaction_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_defined_setattr1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_defined_setattr2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_function_variable_supports_enum_argument_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_function_variable_supports_function_argument_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_function_variable_supports_type_abcmeta_argument_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_getattr1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_getattr2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_getattribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_property_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_usr_cls_classmethod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_usr_cls_staticmethod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_validate_outputs_unbacked_by_custom_op_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_validate_outputs_unbacked_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_variable_access_in_exception_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_variable_tracker_recursively_contains_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_version_ci_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_with_builtin_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_write_to_cells_with_name_shadowing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_write_to_closures_in_inlining_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_writes_to_cells_across_frames1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_writes_to_cells_across_frames2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_yield_from_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_yield_from_in_a_loop_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_yield_from_user_stop_iteration_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_yield_gen_and_from_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_yield_send_to_subgenerator_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_312_local_cell_overlap_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_Size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_abc_setattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_add_complex_conj_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_add_sub_alpha_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_addr_alpha_beta_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_amp_foreach_fake_impl_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_aot_autograd_runtime_wrapper_prologue_profiled_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_as_strided_on_base_with_mutation_works_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_as_strided_on_existing_view_banned_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_attached_attribute_in_dir_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_autograd_function_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_avoid_dupe_specialization_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_batch_encoding_clone_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_batch_norm_act_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_batchnorm_e2e_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_bigbird_unsqueeze_inplace_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_bitwise_op_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_bitwise_print_precedence_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_boxes_len_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_build_map_unpack_with_call_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_c_defined_metaclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_changing_stride_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_chunk_reformer_ff_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_class_member_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_classmethod_with_slots_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_compilation_metrics_on_error_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_compile_complex_conj_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_compile_copy__int_overload_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_const_dict_keyerror_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_contains_range_constprop_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_convert_boxes_to_pooler_format_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_copy_weird_strides_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_create_rand_mask_from_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dalle2_maybe_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_data_attr_mutation_after_saved_for_bw_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dataclass_in_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dataclass_init_with_default_factory_with_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_ddp_checkpoint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dedup_global_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_deferred_runtime_asserts_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_delattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_delattr_raises_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_delattr_return_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_delete_local_error_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_deleted_compile_wrapper_segfault_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_delsubscr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_delsubscr_raises_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_detectron2_instances_cat_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_disabling_unpack_hooks_within_compiled_region_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_distributions_subclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_do_paste_mask_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dont_aggressively_write_assert_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dropout_inline_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dynamic_shape_disable_duck_size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dynamic_shapes_double_not_equal_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dynamic_shapes_float_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dynamic_shapes_implicit_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dynamic_shapes_right_side_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_ellipsis_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_embedding_backward_broadcasting_decomp_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_empty_graph_nested_calls_fullgraph_False_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_empty_graph_nested_calls_fullgraph_True_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_empty_list_contains_with_jump_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_empty_out_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_enum_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_ephemeral_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_error_return_without_exception_set_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_exception_in_dynamo_handling_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_exec_import_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_exec_wildcard_import_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_flip_bad_accuracy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_for_loop_graph_break_before_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_for_loop_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_foreach_decomp_arg_names_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_fsdp_set_input_mutation_applied_when_input_gets_no_gradients_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_function_in_skipfiles_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_functools_wraps_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_gan_repro_trying_to_backward_through_the_graph_a_second_time_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_generator_dealloc_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_get_parameter_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_get_type_hints_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_global_fn_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_grad_mode_carrying_correct_state_after_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_grad_references_cleared_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_graph_break_on_jit_isinstance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_graph_break_on_jit_isinstance_pep585_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_graph_break_unsupported_fake_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_guard_default_device_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_guard_fail_nested_tuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_guard_fail_tensor_bool_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_guard_ordering_shape_fail_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_guard_with_tuple_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hasattr_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_bigbird_unsqueeze_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_classinstantier_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_gelu_inline_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_model_output_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_t5_forward_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_xsoftmax_inference_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_xsoftmax_training_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_iadd_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_incompatible_configs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_indexing_with_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_inductor_dynamic_shapes_broadcasting_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_inductor_no_recursionerror_on_for_loops_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_inductor_rng_default_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_inference_mode_dynamic_shapes_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_inlining_cornercase_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_inplace_unsqueeze_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_int_format_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_intermediate_leaf_requires_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_invalid_seq_unpack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_is_make_fx_tracing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_is_symbolic_tracing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_isinstance_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_isinstance_storage_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_issue111522_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_issue111918_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_issue114171_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_issue126128_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_issue134451_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_issue1466_size_aot_autograd_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_issue175_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_jit_script_defaults_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_jit_trace_errors_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_kwargs_out_list_variable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_list_aliasing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_list_index_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_list_index_not_found_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_list_index_tensor_unsupported_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_list_reverse_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_list_self_reference_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_listcomp_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_longformer_chunk_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_longtensor_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_lru_cache_tracing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_maml_item_capture_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_maml_no_item_capture_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_many_overlapping_inputs_does_not_explode_guards_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_many_views_with_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_map_with_multiple_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_maybe_multiply_symint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_merge_criteria_processor_list1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_merge_criteria_processor_list2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_method_overriding_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_module_in_skipfiles_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_modules_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_multi_dot_import_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_multi_import_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_named_buffers_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nanmean_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_negative_floor_div_solve_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_negative_shape_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nested_while_loop_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nn_module_callable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nn_module_property_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nn_module_stack_bc_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nn_param_freevar_codegen_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nn_parameter_ctor_graph_breaks_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nn_parameter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nn_parametrize_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_no_grad_inline_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_no_tracing_into_eval_frame_ctx_manager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_no_tracing_into_eval_frame_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nonconst_issubclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_not_rewrite_assert_for_other_errors_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nullcontext1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nullcontext2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_numpy_not_ndarray_recompiles_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_numpy_tobytes_no_error_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_odict_get_item_index_name_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_omegaconf_dictconfig_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_omegaconf_listconfig_contains_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_omegaconf_listconfig_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_ones_out_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_optim_state_references_cleared_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_optimized_deepcopy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_optimized_module_patched_init_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_optimized_module_training_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_os_fspath_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_out_nested_cell_shape_change_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_out_nested_cell_tuple_shape_change_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_out_none_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_out_overload_non_contiguous_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_out_root_cell_shape_change_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_out_root_cell_tuple_shape_change_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_output_aliases_intermediate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_overlapping_inputs_with_dynamic_shapes_error_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_overwriting_params_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_partially_initialized_module_property_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_partitioner_activation_memory_budget_with_unbacked_symints_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_partitioner_cse_respects_mutation_boundaries_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_pointless_graph_removal_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_primtorch_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_primtorch_no_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_randint_out_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_recursive_map_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_reformer_eval_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_reformer_min_chunk_len_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_reformer_sorting_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_reformer_train_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_reinplacing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_relative_import_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_relative_import_no_modulename_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_requires_grad_guards_with_grad_mode1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_requires_grad_guards_with_grad_mode2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_restricted_list_subclass1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_restricted_list_subclass2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_restricted_list_subclass3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_return_value_duplication_mixed_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_return_value_duplication_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_return_value_duplication_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_return_weakref_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_rewrite_assert_dont_change_bytecode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_rewrite_assert_noop_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_rewrite_assert_with_msg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_rewrite_assert_with_non_string_msg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_rewrite_assert_without_msg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_rng_state_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_seq_append_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_setattr_requires_grad_graph_breaks_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_setitem_boolean_mask_diff_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_setitem_tensor_prop_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_setitem_tuple_boolean_mask_diff_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_sigmoid_out2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_sigmoid_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_size_typematch_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_slice_into_list_mutable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_slicing_dynamic_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_slicing_dynamic_shape_setitem_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_sort_out2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_sort_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_specialized_stride_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_split_with_sizes_aot_autograd_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_staticmethod_allow_in_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_stk_sdd_is_transposed_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_stop_iteration_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_str_isalnum_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_string_format_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_subclass_graph_output_repro_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_super_classmethod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_super_classmethod_inheritance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_super_diamond_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_super_in_staticmethod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_super_staticmethod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_swin_base_tensor_attr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_symint_bitwise_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_symnode_is_not_op_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_symnode_is_op_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_sys_monitoring_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_data_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_isinstance_tuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_item_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_random_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_aot_eager_func_name_func1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_aot_eager_func_name_func2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_aot_eager_func_name_func3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_eager_func_name_func1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_eager_func_name_func2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_eager_func_name_func3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_inductor_func_name_func1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_inductor_func_name_func2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_inductor_func_name_func3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_mismatched_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_split_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_split_within_device_cm_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_uniform_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_threading_local_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tokenization_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_torch_compile_in_compile_frame_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_torch_ops_aten_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_torch_tensor_ops_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_torch_tensor_ops_no_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_torch_variable_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_torchname_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_trace_functional_tensor_with_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tuple_enum_as_key_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_typed_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_typed_dict_total_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_udf_classes_reconstruction_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_unbacked_arange_in_bounds_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_unbind_copy_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_unpack_hooks_can_be_disabled_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_unpack_hooks_dont_run_during_tracing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_unspecialized_nn_module_with_torch_variable_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_unsqueeze_mul_strides_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_user_ctor_ctx_manager_custom_init_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_user_ctor_ctx_manager_custom_init_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_user_ctor_ctx_manager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_user_defined_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_user_defined_object_callable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_validate_model_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_vc_bumped_in_inference_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_vdd_duplicate_error_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_view_dtype_overload_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_weakref_callback_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_weakref_construction_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_weakref_del_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_weakref_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_weakref_proxy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_weakref_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_while_loop_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_while_loop_graph_break_inside_call_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_with_on_graph_break_inst_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_with_on_graph_break_nested_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_zeros_out_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_access_by_keys_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_basicmodule1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_basicmodule2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_call_fn_with_non_const_inputs_safe_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_cfgmod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_children_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_constloop_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_conv_call_forward_directly_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_conv_call_super_forward_directly_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_conv_transpose_call_forward_directly_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_conv_transpose_call_super_forward_directly_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_densenet_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_enumvalues_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_fnmember_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_fnmembercmp1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_fnmembercmp2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_forward_directly_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_generation_tag_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_inject_module_parameters_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_intarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_iseval1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_iseval2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_isnonelayer_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_istraining1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_istraining2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_layerlist_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module6_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module7_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module_bad_params_call_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module_bad_params_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module_no_cls_to_become_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module_speculation_log_divergence_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_module_attribute_precedence_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_module_call_module_with_static_forward_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_module_class_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_module_comparison_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_module_forward_has_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_module_guard_name_is_valid_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_module_name_string_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_module_property_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_module_static_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_moduledict_custom_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_moduledict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_modulelist_custom_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_modulelist_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_modulelist_nested_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_modulemethod1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_modulemethod2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_named_children_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_nn_module_setattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_nn_module_unspec_int_attr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_nn_moduledict_contains_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_parameterdict_custom_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_parameterdict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_parameters1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_parameters2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_parameters3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_parameters4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_parameters5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_self_mutating1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_seq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_sequential_with_duplicated_module2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_sequential_with_duplicated_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_simple_torch_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_stringmember_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_submodules1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_submodules2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_super1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_super2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_super_class_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_tensorlist_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_torch_function_with_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_torch_mangled_class_name_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_unsupportedmethod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_unsupportedmodule_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_viamodulecall_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_access_class_method_from_user_class_attr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_access_class_method_from_user_class_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_byte_tensor_does_not_crash_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_capture_symbolic_tracing_simple_within_fake_mode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_capture_symbolic_tracing_within_fake_mode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_free_variables_overlapping_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_op_param_buffer_lifted_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_raise_user_error_on_branch_args_mismatch_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_raise_user_error_on_branch_return_multiple_tensors_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_raise_user_error_on_branch_return_non_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_raise_user_error_on_mismatch_return_length_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_raise_user_error_on_mismatch_return_tensor_meta_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_raise_user_error_on_missing_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_raise_user_error_on_non_list_operands_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_raise_user_error_on_non_tensor_operands_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_raise_user_error_on_unsupported_pred_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_cond_supported_pred_types_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_constraint_violation_error_messages_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dataclass_input_output_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dict_return_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dict_return_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_2_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_and_bypass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_and_bypass_reorder_with_non_tensor_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_and_bypass_reorder_with_non_tensor_arg_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_and_bypass_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_and_bypass_with_non_tensor_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_and_bypass_with_non_tensor_arg_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_and_bypass_with_non_tensor_output_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_and_bypass_with_non_tensor_output_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dynamic_slicing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dynamic_slicing_invalid_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dynamic_slicing_simple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dynamo_enum_in_tuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dynamo_list_index_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_empty_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_enforce_equalities_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_compare_optimize_with_make_fx_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_cond_in_aten_symbolic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_control_flow_with_getattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_decomp_asserts_bad_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_decomp_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_defaults_ok_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_dynamic_control_flow_error_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_dynamic_dim_cleanup_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_dynamic_dim_not_1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_dynamic_dim_range_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_graph_bypass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_graph_bypass_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_graph_with_complex_reorder_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_graph_with_complex_reorder_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_graph_with_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_graph_with_list_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_identity_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_masking_with_no_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_meta_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_meta_val_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_mismatched_out_2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_mismatched_out_2_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_mismatched_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_mismatched_out_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_module_specify_constraints_signature_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_multi_dynamic_dim_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_multi_dynamic_dim_unsafe_relationship_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_nn_module_stack_patched_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_no_raise_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_no_tensor_computation_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_pass_arg_by_name_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_pass_arg_by_name_star_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_persist_assert_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_preserve_constraints_as_metadata_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_preserves_nn_module_stack_for_get_attr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_raise_guard_full_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_raise_guard_partial_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_raise_on_relationship_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_shape_control_flow_1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_specialized_int_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_symbolic_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_args_and_empty_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_args_with_default_None_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_args_with_default_float_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_args_with_default_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_args_with_default_tuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_builtin_op_on_assume_constant_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_cond_branches_calling_methods_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_cond_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_cond_dynamic_shape_pred_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_cond_with_closed_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_dict_values_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_free_function_and_class_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_free_function_and_class_method_multiarg_diff_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_free_function_and_class_method_multiarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_free_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_global_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_in_unspecialized_nn_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_list_nonzero_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_list_nonzero_free_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_method_on_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_method_on_module_invoke_twice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_none_control_flow_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_none_control_flow_free_func_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_not_none_control_flow_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_not_none_control_flow_free_func_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_not_none_control_flow_pos_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_not_return_const_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_tuple_nonzero_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_functools_wrapped_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_functools_wrapped_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_kwargs_and_empty_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_kwargs_with_default_None_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_kwargs_with_default_float_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_kwargs_with_default_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_kwargs_with_default_tuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_map_cond_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_map_zero_sized_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_map_zero_sized_tensor_suppress_errors_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_module_layer_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_nonzero_static_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_shallow_list_copy_with_side_effects_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_shallow_list_copy_wo_side_effects_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_stack_trace_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_symbool_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_wrapped_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_exported_graph_serialization_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_func_return_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_func_return_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_fx_pytree_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_immutable_list_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_input_container_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_invalid_input_global_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_invalid_input_global_multiple_access_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_invalid_input_nonlocal_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_invalid_input_unused_nonlocal_ok_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_list_contains_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_list_not_contains_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_list_unpack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_list_unpack_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_map_cond_param_buffer_lifted_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_mixed_real_and_fake_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_multiple_outputs_op_with_evaluator_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_nested_cond_op_param_buffer_lifted_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_no_tensor_computation_2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_no_tensor_computation_2_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_no_tensor_computation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_no_tensor_computation_fail_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_not_functionalize_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_param_buffer_safe_from_mutation_recurse_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_param_buffer_safe_from_mutation_simple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_pre_dispatch_simple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_predispatch_with_for_out_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_predispatch_with_for_out_dtype_nested_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_predispatch_with_higher_order_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_predispatch_with_higher_order_nested_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_preserve_fx_node_metadata_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_preserve_fx_node_metadata_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_preserve_fx_node_metadata_inline_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_preserve_fx_node_metadata_recompile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_remove_redundant_dynamic_dim_in_error_message_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_retracibility_dict_container_inp_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_retracibility_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_retracibility_nested_list_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_round_dynamic_shapes_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_strict_fake_tensor_prop_real_tensors_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_subclass_parameters_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_sum_param_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_sym_contains_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_symbolic_tracing_within_fake_mode_with_constraints_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_symbolic_tracing_within_fake_mode_with_constraints_with_parameters_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_symbool_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_torch_inference_mode_ctx_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_trivial_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_uncaptured_higher_order_op_error_not_suppresed_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_untracked_inputs_in_constraints_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_zeroes_in_and_out_different_shape_on_test_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_zeroes_in_and_out_different_shape_on_test_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_zeroes_in_new_shape_scalar_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_zeroes_in_new_shape_scalar_out_permute_dupe_and_bypass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_zeroes_in_new_shape_scalar_out_permute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_capi_call1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_capi_call2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_capi_call3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_control_flow1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_control_flow2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_control_flow3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_control_flow4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_control_flow5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_dynamic_duck_size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_dynamic_getitem_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_dynamic_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_dynamic_order_dependence_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_dynamic_zero_inference_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_enumerate_not_break_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_extended_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_graph_break_on_item_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_indirect_unsupported1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_indirect_unsupported2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_indirect_unsupported3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_multigraph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_no_graph_break_on_item_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_pop_after_resume_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_restore_range_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_restore_range_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_restore_state_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume_freevars_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume_paths_join_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume_tuple_iterator_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume_with_no_grad1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume_with_no_grad2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume_with_no_grad3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_stack_state1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_stack_state2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_start1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_start2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_start3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_start4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_tuple_iterator_mutate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_tuple_iterator_return_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_access_module_attr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_allow_python_side_effects_utility_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_capture_constants_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_capture_global_num_adds_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_capture_global_num_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_capture_input_num_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_capture_numpy_number_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_capture_tracked_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_capture_tracked_nested_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_capture_untracked_global_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_capture_untracked_global_nested_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_capture_untracked_nonlocal_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_capture_value_created_in_subgraph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_concat_unbacked_shape_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_branches_no_arguments_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_branches_no_arguments_no_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_free_variable_in_both_branches_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_graph_break_in_one_branch_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_pytree_operands_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_pytree_operands_with_non_tensor_leaves_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_side_effect_in_one_branches_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_source_fn_stack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_subgraph_name_is_valid_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_with_constant_pred_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_with_empty_operands_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_dynamic_shapes_over_vmap_batch_size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_enum_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_error_message_sane_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_fallback_on_graph_break_complicated_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_fallback_on_graph_break_simple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_flat_list_output_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_fn_with_kwargs_in_torch_ops_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_freevars_as_inputs_to_wrap_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_grad_source_fn_stack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_hints_wrapper_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_hints_wrapper_incorrect_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_hints_wrapper_no_hints_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_hints_wrapper_pytree_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_hooks_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_hopify_generic_wrap_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_inlined_functions_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_internal_nonlocal_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_lift_tensor_constant_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_lift_tensors_with_compound_expressions_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_lift_tensors_with_shared_symbols_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_make_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_example_value_metadata_consistent_with_eager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_lowers_to_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_multi_return_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_pytree_return_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_side_effect_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_source_fn_stack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_subgraph_name_is_valid_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_symint_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_modules_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_nested_tuple_output_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_nested_wrap_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_no_freevars_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_output_with_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_register_mode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_register_subclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_return_captured_var_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_return_captured_var_used_multiple_times_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_return_captured_vars_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_same_freevar_twice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_del_existing_attr_global_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_del_existing_attr_global_obj_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_del_existing_attr_nonlocal_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_del_existing_attr_nonlocal_obj_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_in_body_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_local_list_append_no_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_mutate_global_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_mutate_global_num_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_mutate_global_num_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_mutate_global_tensor_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_mutate_global_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_mutate_nonlocal_num_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_mutate_nonlocal_num_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_mutate_nonlocal_tensor_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_mutate_nonlocal_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_nested_nonlocal_list_append_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_nonlocal_list_append_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_set_existing_attr_global_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_set_existing_attr_global_obj_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_set_existing_attr_nonlocal_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_set_existing_attr_nonlocal_obj_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_set_new_attr_global_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_set_new_attr_global_obj_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_set_new_attr_nonlocal_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_set_new_attr_nonlocal_obj_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_support_float_in_output_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_symint_in_slice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_symint_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_tensor_and_unbacked_symbol_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_tensor_to_list_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_tensor_with_unbacked_shape_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_unbacked_symbol_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_vmap_multiply_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_vmap_source_fn_stack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_all_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_allow_local_assign_in_body_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_kwarg_default_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_kwarg_default_else_branch_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_kwarg_default_if_branch_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_kwarg_int_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_kwarg_only_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_kwarg_recompile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_pytree_args_nested_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_pytree_args_not_const_symint_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_pytree_args_with_symint_constant_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_pytree_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_source_fn_stack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_subgraph_name_is_valid_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_functional_call_disable_inline_nn_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_functional_call_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_functional_call_sequential_params_and_buffers_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_call_compiled_backward_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_call_torch_compile_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_capture_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_closure_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_fn_with_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_freevar_python_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_freevar_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_non_tensor_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_over_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_pytree_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_recompile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_two_tensor_all_grad_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_two_tensor_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_with_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_with_side_effect_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_hessian_argnums_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_hessian_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jacfwd_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jacfwd_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jacfwd_randomness_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jacfwd_two_tensors_argnums_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jacrev_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jacrev_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jacrev_two_tensors_argnums_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_call_torch_compile_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_freevar_python_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_freevar_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_jvp_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_simple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_two_tensors_disable_enable_disable_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_two_tensors_disable_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_two_tensors_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_linearize_jvp_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vjp_call_compiled_backward_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vjp_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vjp_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vjp_multiple_outputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vjp_multiple_outputs_python_struct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_call_compiled_backward_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_call_torch_compile_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_free_const_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_free_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_get_wrapped_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_multiple_invocation_in_dims_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_multiple_invocation_out_dims_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_multiple_outputs_diff_dims_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_multiple_outputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_multiple_outputs_out_dims_tuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_new_tensor_implicit_via_op_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_new_tensor_in_body_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_new_tensor_unused_in_body_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_out_dims_None_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_over_vmap_captured_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_over_vmap_two_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_previous_illegal_op_no_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_pytree_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_recompile_different_config_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_recompile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_recompile_same_config_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_recompile_with_randomness_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_side_effects_append_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_side_effects_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_two_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_two_inputs_tuple_in_dims_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_with_conditional_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_with_graph_break_2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_with_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_with_graph_break_lambda_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_LSTM_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_alias_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_aot_autograd_expand_mutation_backwards_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_aot_autograd_expand_mutation_error_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_aot_autograd_expand_mutation_functionalizes_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_aot_autograd_raises_invalid_leaf_set_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_aot_export_joint_simple_repro_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_aot_grad_mode_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_aot_sequence_nr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_args_param_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_args_param_non_tensor_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_args_param_non_tensor_arg_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_with_global_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_autograd_function_tangent_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_call_fn_with_non_const_inputs_aot_safe_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_call_fn_with_non_const_inputs_aot_unsafe_control_flow_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_call_fn_with_non_const_inputs_aot_unsafe_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_data_ptr_access_copy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_data_ptr_access_fails_in_backward_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_data_ptr_access_fails_in_forward_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_different_inputs_overlapping_set_with_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer6_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer_with_retain_or_create_graph1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer_with_retain_or_create_graph2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer_with_retain_or_create_graph3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer_with_retain_or_create_graph4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_double_backward_errors_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_eager_sequence_nr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_grad_inputs_alias_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_inputs_overlapping_with_mutation_recompile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_inputs_overlapping_with_mutation_stress_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_multiple_aot_autograd_calls_dupe_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_mutation1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_negative_testing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_negative_testing_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_nn_parameter_construction_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_no_storage_overlap_guards_no_aliasing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_no_storage_overlap_guards_no_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_requires_grad_fake_via_dynamo_recompiles_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_split_with_sizes_aot_autograd_cleans_up_traceback_meta_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesTestSDPA::test_graph_break_SDPAParams_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesTestSDPA::test_input_SDPAParams_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesTestSDPA::test_intermediate_attr_access_SDPAParams_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesTestSDPA::test_returns_SDPAParams_dynamic_shapes 2025-09-07T07:46:03.4396640Z 2025-09-07T07:46:03.4396800Z Running test_decomp 22/22 ... [2025-09-07 07:46:03.302832] 2025-09-07T07:46:03.4397151Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:46:03.4398032Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=22', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:46:03.303148] 2025-09-07T07:46:03.6841593Z 2025-09-07T07:46:03.6842454Z test_decomp 14/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_14.22_a097f3f6ff9cb5a8_.log 2025-09-07T07:46:03.6947048Z Running 401 items in this shard: test/test_decomp.py::TestDecompCUDA::test_arange_graph_cuda, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rxor___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__native_batch_norm_legit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcdiv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcdiv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_left_shift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exponential_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float8_e5m2fnuz, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hypot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_inner_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logaddexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_celu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ormqr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_searchsorted_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_searchsorted_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_exponential_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_hamming_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_hann_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_indices_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__native_batch_norm_legit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__upsample_bilinear2d_aa_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_cauchy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_frac_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_tril_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_unbind_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_frac_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_log_normal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_native_dropout_backward_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_gelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_huber_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_logsigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_rrelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softshrink_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_round_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_neg_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_vdot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_float32, test/test_decomp.py::DecompOneOffTestsCUDA::test_exponential_non_inf_cuda 2025-09-07T07:46:03.7043977Z 2025-09-07T07:46:03.7044159Z Running dynamo/test_einops 1/1 ... [2025-09-07 07:46:03.684998] 2025-09-07T07:46:03.7044520Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:46:03.7045548Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_einops.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:46:03.685372] 2025-09-07T07:46:07.3554372Z 2025-09-07T07:46:07.3555436Z dynamo/test_einops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_einops_1.1_dd71f8d218d1fa12_.log 2025-09-07T07:46:07.3557395Z Running 3 items in this shard: test/dynamo/test_einops.py::TestEinops::test_functions_version_none, test/dynamo/test_einops.py::TestEinops::test_layers_version_none, test/dynamo/test_einops.py::TestEinops::test_no_recompile_on_lazy_state_version_none 2025-09-07T07:46:07.3558489Z 2025-09-07T07:46:07.3558703Z Running dynamo/test_callback 1/1 ... [2025-09-07 07:46:07.355499] 2025-09-07T07:46:07.3559139Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:46:07.3561758Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_callback.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:46:07.355897] 2025-09-07T07:46:14.1299274Z 2025-09-07T07:46:14.1300115Z dynamo/test_callback 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_callback_1.1_8d353131cba625d2_.log 2025-09-07T07:46:14.1302357Z Running 4 items in this shard: test/dynamo/test_callback.py::CallbackTests::test_callbacks_with_duplicate_prevention, test/dynamo/test_callback.py::CallbackTests::test_counter, test/dynamo/test_callback.py::CallbackTests::test_counter_assertion, test/dynamo/test_callback.py::CallbackTests::test_triggers 2025-09-07T07:46:14.1303900Z 2025-09-07T07:46:14.1304169Z Running nn/test_parametrization 1/1 ... [2025-09-07 07:46:14.129992] 2025-09-07T07:46:14.1304701Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:46:14.1305981Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_parametrization.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:46:14.130354] 2025-09-07T07:46:18.2006249Z 2025-09-07T07:46:18.2007323Z nn/test_parametrization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_parametrization_1.1_af4ec37731b0133c_.log 2025-09-07T07:46:18.2033030Z Running 58 items in this shard: test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_with_transfer_parametrizations_and_params_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_with_transfer_parametrizations_and_params_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_deepcopy_after_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_deepcopy_after_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_errors_parametrized_tensor_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_errors_parametrized_tensor_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_errors_unparametrized_tensor_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_errors_unparametrized_tensor_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_initialization_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_initialization_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_multiple_inputs_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_multiple_inputs_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_dim_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_dim_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_forward_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_forward_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_load_state_dict_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_load_state_dict_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_value_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_value_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_errors_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_errors_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_parametrization_same_training_mode_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_parametrization_same_training_mode_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_buffer_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_buffer_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_nested_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_nested_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_parametrization_no_grad, test/nn/test_parametrization.py::TestNNParametrization::test_serialization_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_serialization_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_many_to_one_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_many_to_one_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_right_inverse_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_right_inverse_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_single_param_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_single_param_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_type_before_parametrizations_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_type_before_parametrizations_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_deepcopy_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_deepcopy_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_pickle_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_pickle_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_state_dict_compat_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_state_dict_compat_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_wrapper_subclass_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrizationDeviceCUDA::test_weight_norm_parametrization_swap_False_cuda, test/nn/test_parametrization.py::TestNNParametrizationDeviceCUDA::test_weight_norm_parametrization_swap_True_cuda 2025-09-07T07:46:18.2053577Z 2025-09-07T07:46:18.2053719Z Running test_masked 1/1 ... [2025-09-07 07:46:18.200727] 2025-09-07T07:46:18.2054139Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:46:18.2055034Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_masked.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:46:18.201096] 2025-09-07T07:46:23.4233043Z 2025-09-07T07:46:23.4233952Z test_masked 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_masked_1.1_648f5cbb61db62d6_.log 2025-09-07T07:46:23.4291446Z Running 194 items in this shard: test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_where_coo_fill_value_0_cuda, test/test_masked.py::TestMaskedCUDA::test_where_coo_fill_value_123_cuda, test/test_masked.py::TestMaskedCUDA::test_where_csr_fill_value_0_cuda, test/test_masked.py::TestMaskedCUDA::test_where_csr_fill_value_123_cuda, test/test_masked.py::TestMaskedCUDA::test_where_hybrid_coo_fill_value_0_cuda, test/test_masked.py::TestMaskedCUDA::test_where_hybrid_coo_fill_value_123_cuda 2025-09-07T07:46:23.4343512Z 2025-09-07T07:46:23.4343718Z Running export/test_experimental 1/1 ... [2025-09-07 07:46:23.423665] 2025-09-07T07:46:23.4344113Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:46:23.4345046Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_experimental.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:46:23.424034] 2025-09-07T07:46:27.2446549Z 2025-09-07T07:46:27.2448034Z export/test_experimental 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_experimental_1.1_e75c67fa8972b827_.log 2025-09-07T07:46:27.2452631Z Running 10 items in this shard: test/export/test_experimental.py::TestExperiment::test_export_add_in_out_info, test/export/test_experimental.py::TestExperiment::test_export_leaf, test/export/test_experimental.py::TestExperiment::test_joint_basic, test/export/test_experimental.py::TestExperiment::test_joint_buffer_input_mutations, test/export/test_experimental.py::TestExperiment::test_joint_cifar10_backwards, test/export/test_experimental.py::TestExperiment::test_joint_dynamic, test/export/test_experimental.py::TestExperiment::test_joint_loss_index, test/export/test_experimental.py::TestExperiment::test_sticky_export, test/export/test_experimental.py::TestExperiment::test_sticky_export_dynamic, test/export/test_experimental.py::TestExperiment::test_sticky_export_nested_inp 2025-09-07T07:46:27.2456029Z 2025-09-07T07:46:27.2456233Z Running nn/test_pruning 1/1 ... [2025-09-07 07:46:27.244745] 2025-09-07T07:46:27.2456604Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:46:27.2457785Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_pruning.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:46:27.245114] 2025-09-07T07:46:30.9152632Z 2025-09-07T07:46:30.9153581Z nn/test_pruning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pruning_1.1_49d4f372f55fbc34_.log 2025-09-07T07:46:30.9175263Z Running 34 items in this shard: test/nn/test_pruning.py::TestPruningNN::test_compute_nparams_to_prune, test/nn/test_pruning.py::TestPruningNN::test_custom_from_mask_pruning, test/nn/test_pruning.py::TestPruningNN::test_global_pruning, test/nn/test_pruning.py::TestPruningNN::test_global_pruning_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_identity_pruning, test/nn/test_pruning.py::TestPruningNN::test_l1_unstructured_pruning, test/nn/test_pruning.py::TestPruningNN::test_l1_unstructured_pruning_with_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_ln_structured_pruning, test/nn/test_pruning.py::TestPruningNN::test_ln_structured_pruning_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_multiple_pruning_calls, test/nn/test_pruning.py::TestPruningNN::test_prune, test/nn/test_pruning.py::TestPruningNN::test_prune_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_prune_importance_scores_mimic_default, test/nn/test_pruning.py::TestPruningNN::test_pruning_container, test/nn/test_pruning.py::TestPruningNN::test_pruning_container_compute_mask, test/nn/test_pruning.py::TestPruningNN::test_pruning_id_consistency, test/nn/test_pruning.py::TestPruningNN::test_pruning_rollback, test/nn/test_pruning.py::TestPruningNN::test_pruning_serialization_model, test/nn/test_pruning.py::TestPruningNN::test_pruning_serialization_state_dict, test/nn/test_pruning.py::TestPruningNN::test_random_pruning, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_0perc, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_forward, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_new_weight, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_orig, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_pickle, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_sizes, test/nn/test_pruning.py::TestPruningNN::test_random_structured_pruning_amount, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning_exception, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning_forward, test/nn/test_pruning.py::TestPruningNN::test_rnn_pruning, test/nn/test_pruning.py::TestPruningNN::test_unstructured_pruning_same_magnitude, test/nn/test_pruning.py::TestPruningNN::test_validate_pruning_amount, test/nn/test_pruning.py::TestPruningNN::test_validate_pruning_amount_init 2025-09-07T07:46:30.9183456Z 2025-09-07T07:46:30.9183657Z Running export/test_converter 1/1 ... [2025-09-07 07:46:30.915336] 2025-09-07T07:46:30.9184022Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:46:30.9184936Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_converter.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:46:30.915699] 2025-09-07T07:46:47.0523534Z 2025-09-07T07:46:47.0524683Z export/test_converter 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_converter_1.1_8b121d5d1374ede9_.log 2025-09-07T07:46:47.0540944Z Running 48 items in this shard: test/export/test_converter.py::TestConverter::test_aten___getitem___dict, test/export/test_converter.py::TestConverter::test_aten___getitem___list, test/export/test_converter.py::TestConverter::test_aten___is__, test/export/test_converter.py::TestConverter::test_aten___isnot__, test/export/test_converter.py::TestConverter::test_aten___not__, test/export/test_converter.py::TestConverter::test_aten_add_t, test/export/test_converter.py::TestConverter::test_aten_append_t, test/export/test_converter.py::TestConverter::test_aten_dim, test/export/test_converter.py::TestConverter::test_aten_floordiv, test/export/test_converter.py::TestConverter::test_aten_len, test/export/test_converter.py::TestConverter::test_aten_tensor_dtype_int, test/export/test_converter.py::TestConverter::test_aten_tensor_dynamic, test/export/test_converter.py::TestConverter::test_aten_tensor_prim_dtype, test/export/test_converter.py::TestConverter::test_aten_to_dtype_with_mutating_storage, test/export/test_converter.py::TestConverter::test_context_manager, test/export/test_converter.py::TestConverter::test_convert_func_without_param, test/export/test_converter.py::TestConverter::test_convert_if_basic, test/export/test_converter.py::TestConverter::test_convert_if_duplicate_attr_names, test/export/test_converter.py::TestConverter::test_convert_if_multiple_out, test/export/test_converter.py::TestConverter::test_convert_if_tuple_out, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_buffer, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_if_and_buffer, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_if_and_param, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_param, test/export/test_converter.py::TestConverter::test_convert_retrace_nested_scripted_modules, test/export/test_converter.py::TestConverter::test_convert_script_object, test/export/test_converter.py::TestConverter::test_get_tensor_constants, test/export/test_converter.py::TestConverter::test_hidden_input_name, test/export/test_converter.py::TestConverter::test_implicit_constant_to_tensor_handling, test/export/test_converter.py::TestConverter::test_prim_SetAttr, test/export/test_converter.py::TestConverter::test_prim_device, test/export/test_converter.py::TestConverter::test_prim_device_cuda, test/export/test_converter.py::TestConverter::test_prim_dtype, test/export/test_converter.py::TestConverter::test_prim_max, test/export/test_converter.py::TestConverter::test_prim_min, test/export/test_converter.py::TestConverter::test_prim_tolist, test/export/test_converter.py::TestConverter::test_profiler__record_function, test/export/test_converter.py::TestConverter::test_raise_exception, test/export/test_converter.py::TestConverter::test_ts2ep_convert_quantized_model, test/export/test_converter.py::TestConverter::test_ts2ep_convert_quantized_model_with_opcontext, test/export/test_converter.py::TestConverter::test_ts2ep_convert_quantized_model_with_opcontext_and_constant, test/export/test_converter.py::TestConverter::test_ts2ep_converter_basic, test/export/test_converter.py::TestConverter::test_ts2ep_converter_container_output, test/export/test_converter.py::TestConverter::test_ts2ep_converter_contains, test/export/test_converter.py::TestConverter::test_ts2ep_converter_custom_op, test/export/test_converter.py::TestConverter::test_ts2ep_converter_unpack, test/export/test_converter.py::TestConverter::test_ts2ep_multi_outputs_on_call_ops, test/export/test_converter.py::TestConverter::test_ts2ep_with_loop 2025-09-07T07:46:47.0552901Z 2025-09-07T07:46:47.0553096Z Running test_bundled_inputs 1/1 ... [2025-09-07 07:46:47.052451] 2025-09-07T07:46:47.0553462Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:46:47.0554360Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_bundled_inputs.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:46:47.052814] 2025-09-07T07:46:50.6723829Z 2025-09-07T07:46:50.6724936Z test_bundled_inputs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_bundled_inputs_1.1_9366a34c27a4f2ff_.log 2025-09-07T07:46:50.6731092Z Running 12 items in this shard: test/test_bundled_inputs.py::TestBundledInputs::test_bad_inputs, test/test_bundled_inputs.py::TestBundledInputs::test_dict_args, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_fail, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_non_mutator, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_success, test/test_bundled_inputs.py::TestBundledInputs::test_large_tensor_with_inflation, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs_both_defined_failure, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs_neither_defined_failure, test/test_bundled_inputs.py::TestBundledInputs::test_non_tensors, test/test_bundled_inputs.py::TestBundledInputs::test_rejected_tensors, test/test_bundled_inputs.py::TestBundledInputs::test_single_tensors 2025-09-07T07:46:50.6736400Z 2025-09-07T07:46:50.6736682Z Running inductor/test_fxir_backend 1/1 ... [2025-09-07 07:46:50.672457] 2025-09-07T07:46:50.6737131Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:46:50.6738194Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fxir_backend.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:46:50.672806] 2025-09-07T07:46:57.4964475Z 2025-09-07T07:46:57.4965635Z inductor/test_fxir_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fxir_backend_1.1_63c3d60dfc541d4c_.log 2025-09-07T07:46:57.4980689Z Running 38 items in this shard: test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_False_use_dynamic_shapes_False, test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_False_use_dynamic_shapes_True, test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_True_use_dynamic_shapes_False, test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_True_use_dynamic_shapes_True, test/inductor/test_fxir_backend.py::FxirTestCase::test_backward, test/inductor/test_fxir_backend.py::FxirTestCase::test_basic, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_inputs, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_reinterpret_view, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_to_alloc, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_views, test/inductor/test_fxir_backend.py::FxirTestCase::test_cpp_raises, test/inductor/test_fxir_backend.py::FxirTestCase::test_custom_compiler, test/inductor/test_fxir_backend.py::FxirTestCase::test_custom_triton, test/inductor/test_fxir_backend.py::FxirTestCase::test_debug, test/inductor/test_fxir_backend.py::FxirTestCase::test_duplicate_input, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_launch_grid_calc_python, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_launch_grid_calc_python_slow, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_and_strides, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_precomputed_size, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_with_padding_shape0, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_with_padding_shape1, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_with_padding_shape2, test/inductor/test_fxir_backend.py::FxirTestCase::test_export_const_placeholder_const_1, test/inductor/test_fxir_backend.py::FxirTestCase::test_export_const_placeholder_const_1_5, test/inductor/test_fxir_backend.py::FxirTestCase::test_extern, test/inductor/test_fxir_backend.py::FxirTestCase::test_extern_multi_output, test/inductor/test_fxir_backend.py::FxirTestCase::test_fallback, test/inductor/test_fxir_backend.py::FxirTestCase::test_free, test/inductor/test_fxir_backend.py::FxirTestCase::test_multiple_kernels, test/inductor/test_fxir_backend.py::FxirTestCase::test_output_slice_view, test/inductor/test_fxir_backend.py::FxirTestCase::test_reshape_output, test/inductor/test_fxir_backend.py::FxirTestCase::test_subgraph_raises, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_add, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_const, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_dynamic, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_linear, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_custom_backend, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_custom_triton_autotune_dynamic 2025-09-07T07:46:57.4991900Z 2025-09-07T07:46:57.4992160Z Running torch_np/numpy_tests/lib/test_histograms 1/1 ... [2025-09-07 07:46:57.496489] 2025-09-07T07:46:57.4992599Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:46:57.4993594Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_histograms.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:46:57.496823] 2025-09-07T07:47:01.8176301Z 2025-09-07T07:47:01.8177603Z torch_np/numpy_tests/lib/test_histograms 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_histograms_1.1_4c4a9679c563e56a_.log 2025-09-07T07:47:01.8202164Z Running 60 items in this shard: test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_arr_weights_mismatch, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_big_arrays, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_bin_array_dims, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_bin_edge_cases, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_bool_conversion, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_density, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_empty, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_error_binnum_type, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_exotic_weights, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_f32_rounding, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_finite_range, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_histogram_bin_edges, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_invalid_range, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_last_bin_inclusive_range, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_no_side_effects, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_object_array_of_0d, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_one_bin, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_outliers, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_precision, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_signed_overflow_bounds, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_signed_overflow_bounds_2, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_simple, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_some_nan_values, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_type, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_unsigned_monotonicity_check, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_weights, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_empty, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_incorrect_methods, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_limited_variance, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_novariance, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_outlier, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_scott_vs_stone, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_auto, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_doane, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_fd, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_rice, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_scott, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_stone, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_sturges, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_simple, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_simple_range, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_simple_weighted, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_small, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_bins_array, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_bins_error_2, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_bins_errors, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_density_non_uniform_1d, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_density_non_uniform_2d, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_edge_dtype, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_empty, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_equal_edges, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_finite_range, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_identical_samples, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_inf_edges, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_large_integers, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_rightmost_binedge, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_shape_3d, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_shape_4d, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_simple, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_weights 2025-09-07T07:47:01.8220093Z 2025-09-07T07:47:01.8220379Z Running test_maskedtensor 1/1 ... [2025-09-07 07:47:01.817797] 2025-09-07T07:47:01.8220740Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:01.8221639Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_maskedtensor.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:01.818189] 2025-09-07T07:47:04.2366398Z 2025-09-07T07:47:04.2367565Z inductor/test_torchinductor_dynamic_shapes 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_dynamic_shapes_1.2_b5d1f08e0ddbf7e5_.log 2025-09-07T07:47:04.2717878Z Running 851 items in this shard: test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_AllenaiLongformerBase_repro_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__dyn_quant_matmul_4bit_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__unsafe_masked_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool1d_argmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool2d_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_max_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_pool_errors_with_long_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex9_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_const_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_const_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_inplace_permuted_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_addmm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_addmv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_alexnet_prefix_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aliased_buffer_reuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_allow_reuse_disable_if_exceed_peak_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_angle_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_any_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_cache_hit_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_dtype_device_layout_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_argmin1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_argmin2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_argmin_with_duplicates_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_argmin_with_nan_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_min_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_to_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool3d_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool3d_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool3d_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool_errors_with_uint_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_batch_norm_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bernoulli1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bitwise2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bitwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bmm2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int64_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int64_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int64_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int8_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int8_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_uint8_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_copied_in_graph_with_different_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_use_after_remove_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_float_ndigits_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_int_ndigits_pos_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_int_ndigits_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_empty_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_negative_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_of_loops_and_extern_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_unbacked_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_upcasting_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_chunk_recompiles_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_clamp_type_promotion_non_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_clone_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_compar_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_complex_memory_overlap_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_config_option_dont_assume_alignment_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_consecutive_split_cumprod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_consecutive_split_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_1d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_2d_strides_nonpositive_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_fill_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv2d_backward_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv3d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_bn_fuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_shape_check_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_with_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cudnn_rnn_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cummin_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_default_layout_constraint_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_scan_op_compiled_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_scan_op_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_scan_would_split_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_data_type_propogation_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dense_mask_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_deterministic_codegen_on_graph_break_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_deterministic_codegen_with_suffix_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_device_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_diagonal_copy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dist_bf16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dist_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div9_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_precision_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_softmax_symfloat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dont_constant_fold_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dropout_deterministic_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dropout_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_bfloat16_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_bfloat16_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_bfloat16_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int64_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int64_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int64_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_embedding_bag_byte_unpack_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_embedding_bag_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_embedding_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_empty_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_erfinv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_exp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_expanded_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fallback_mutable_op_basic_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fallback_mutable_op_list_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fill2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float32_to_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float_index_expression_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float_index_expression_type_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_floordiv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fmin_fmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fmod_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_full_like_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_full_truncation_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_functionalize_rng_wrappers_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fusing_write_into_disjoint_read_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_gather1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_gather2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_gelu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_generated_code_has_alignment_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_argmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_constant_tensor1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_misaligned_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_mutation_real_name_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_no_inputs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_pad_dynamic_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_scalar_inputs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_unbacked_symint_as_output_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_hardsigmoid_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_hardswish_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_horizonal_fusion2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_device_assert_masked_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_nested_indirect_indexing_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put_deterministic_fallback_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_remainder_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_select_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inductor_multiple_specializations_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inductor_triton_bucketize_respects_masking_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inplace_activations_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inplace_resize_as_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_input_mutation2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_input_mutation4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_int_input_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_invalid_operand_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_issue102546_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_grid_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_offset_pointwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_layer_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_leaky_relu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lerp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lgamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands_sliced_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear_mixed_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linspace1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linspace2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_log1p_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_log_softmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_logaddexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_logcumsumexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_logcumsumexp_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_logsumexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_low_memory_max_pool_dilation_1_dim_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_low_memory_max_pool_dilation_2_dim_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_low_memory_max_pool_dilation_2_dim_3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_masked_fill_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_masked_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_min_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d6_dilation_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_min_max_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_misaligned_address_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mixed_mm2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mixed_mm3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mm_mixed_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mm_views_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mul_softmax_symfloat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multi_gpu_device_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multi_threading_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multilayer_var_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multilayer_var_lowp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mutable_custom_op_fixed_layout_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mutations_loop_fusion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nan_sort_stable_False_descending_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nan_sort_stable_True_descending_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nan_sort_stable_True_descending_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_needs_contiguous_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_neg_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nll_loss_forward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pad_single_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pattern_matcher_multi_user_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pattern_matcher_unbacked_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_permute2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_philox_rand_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_bessel_j0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_bessel_j1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_bessel_y0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_chebyshev_polynomial_v_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_erfc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_erfcx_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_erfinv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_gammainc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_hermite_polynomial_h_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_hermite_polynomial_he_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_laguerre_polynomial_l_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_legendre_polynomial_p_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_log1p_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_log_ndtr_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_logit_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_modified_bessel_i0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_modified_bessel_i1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_modified_bessel_k1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_multigammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_ndtri_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_polygamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_scaled_modified_bessel_k1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_shifted_chebyshev_polynomial_u_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_shifted_chebyshev_polynomial_w_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_sinc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_xlog1py_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_zeta_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_polar_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow_symfloat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_prepare_softmax_with_fast_math_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_profiler_mark_wrapper_call_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_rand_like_deterministic_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randn_like_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randn_with_dtype_and_device_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reduction1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reduction2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reduction4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reinterpret_dtypeview_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remainder_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_no_ops_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_clone_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_copy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_view_default_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int64_nd_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_replication_pad_errors_with_bool_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_require_stride_expanded_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_resize_as_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reuse_buffers_with_aliasing_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_round_correctness_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scalar_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scalar_output_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scaled_dot_product_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scaled_dot_product_efficient_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_add1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_add2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_reduce1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_reduce2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_reduce3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_unaligned_mask_freezing_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_searchsorted_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sgn_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sgn_extremal_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_shape_prop_torch_ones_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_should_pad_bench_for_bmm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sign_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sin_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_single_elem_indirect_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_size_asserts_for_multi_output_fallback_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_mutation1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_mutation2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_backward_data_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sort_bool_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sort_stable_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_special_polygamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_cumsum_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_cumsum_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_failed_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_with_integer_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_with_unbacked_symints_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sqrt_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_squeeze1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_squeeze2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_squeeze_varargs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_stack_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_std_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_stride_preservation_with_stride_modifying_fx_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_strided_inputs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum_keepdims_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tanh_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tensor2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tmp_not_defined_issue3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_to_device_constant_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_topk_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_transpose_add_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_transposed_propagates_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_uint_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unbacked_floordiv_simplify_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unsqueeze_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unsqueeze_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_upsample_bilinear2d_a_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_upsample_cat_conv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_upsample_nearest2d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_var_mean_tile_reduction_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_vdd_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_vectorized_ops_masked_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_vectorized_ops_masked_var_novec_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_vertical_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_view_on_aliased_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_view_uint8_through_differing_bitwidths_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_where_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_xblock_divides_xnumel_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_zero_dim_reductions_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__dyn_quant_matmul_4bit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__dyn_quant_pack_4bit_weight_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__unsafe_masked_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_avg_pool1d_argmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_avg_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_avg_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_avg_pool_errors_with_long_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_pool_errors_with_long_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_const_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_const_int_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_addmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_allow_reuse_active_if_under_peak_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_allow_reuse_disable_if_exceed_peak_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_angle_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_support_out_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_argmin1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_argmin_with_duplicates_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_min_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_assert_alignment_op_name_pass_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_assert_size_stride_op_name_fail_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d6_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d7_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d_backward4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool3d_backward2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool3d_backward4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool3d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bfloat16_to_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bitwise2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bitwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bmm1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bool_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_both_scalars_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_add_autotune_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int16_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int16_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int32_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int64_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int64_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_buffer_copied_in_graph_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_buffer_copied_in_graph_with_different_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_extern_kernel_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_negative_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_of_loops_and_extern_kernel_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_unbacked_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_unbacked_empty_1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_unbacked_legacy_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cauchy_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_chunk_recompiles_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_clamp_type_promotion_non_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_complex_memory_overlap_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_concat_add_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_2d_strides_nonpositive_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_nd_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv2d_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv3d_channels_last_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_with_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_convolution2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_convolution3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_convolution4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cos_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cummin_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumprod_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_no_mask_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_pattern_matcher_issue_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_fixed_layout_sequential_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_scan_op_compiled_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_scan_op_multi_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_scan_would_split_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_data_type_propogation_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dense_mask_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_deterministic_codegen_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_deterministic_codegen_with_suffix_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_device_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_diagonal_copy_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dist_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_prim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout_deterministic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout_trivial_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float16_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float16_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float16_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_fusion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_embedding_bag_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_embedding_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_empty_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_erfc_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_exp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_list_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_no_mutated_tensors_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_with_return_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fft_real_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float16_to_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float32_to_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fmod_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fractional_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fractional_max_pool2d4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_full_like_sliced_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_functionalize_rng_wrappers_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fuse_large_params_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fuse_tiled_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fusing_write_into_disjoint_read_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_gather3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_gather_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_gelu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_generated_code_has_alignment_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_generated_code_has_size_stride_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_argmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_both_scalars_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_constant_tensor2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_misaligned_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_mutation_real_name_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_pad_dynamic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_hardsigmoid_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_hardtanh_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_horizonal_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_propagation_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_propagation_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_propagation_floordiv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_as_masked_fill_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_deterministic_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inductor_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inductor_multiple_specializations_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inductor_triton_bucketize_respects_masking_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_mixed_dtype_ops_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_resize_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_where_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_insignificant_strides_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_int8_weight_only_quant_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_int_input_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_invalid_operand_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_isin_tensor_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_isinf2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_kernel_names_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_grid_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_offset_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_tensor_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_layer_norm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lgamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear_dynamic_maxautotune_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linspace1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linspace2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linspace3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_log2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_log_softmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_logcumsumexp_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_low_memory_max_pool_dilation_1_dim_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_low_memory_max_pool_dilation_2_dim_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mark_dynamic_with_hint_override_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_masked_fill_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_masked_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_matmul_layer_norm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d6_dilation_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d7_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_min_max_reduction_nan_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mixed_mm3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mixed_mm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mm_views_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mul_softmax_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_gpu_recompile_on_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_threading_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multilayer_any_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multilayer_sum_low_prec_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multilayer_var_lowp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mutable_custom_op_fixed_layout_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mutations_loop_fusion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nan_sort_stable_False_descending_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nan_to_num_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_neg_max_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_new_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_new_empty_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pad_cast_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_philox_rand_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_airy_ai_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_bessel_j1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_bessel_y0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_chebyshev_polynomial_u_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_expit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_expm1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_gammainc_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_hermite_polynomial_he_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_i0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_i1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_i1e_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_legendre_polynomial_p_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_logit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_modified_bessel_k1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_ndtr_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_ndtri_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_polygamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_psi_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_scaled_modified_bessel_k1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_sinc_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_spherical_bessel_j0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_xlogy_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_polar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pow1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pow3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pow_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_prepare_softmax_with_fast_math_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randint_distribution_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randint_kernel_count_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randn_generator_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randn_like_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randn_with_dtype_and_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_copy_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_slice_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_view_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_replication_pad_errors_with_bool_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_resize_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_resize_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reuse_buffers_with_aliasing_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_roi_align_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_round_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_rsqrt_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scalar_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter6_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_add1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_reduce3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scheduler_vertical_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sdpa_unaligned_mask_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_searchsorted_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_setitem_with_int_parameter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sgn_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sgn_extremal_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_should_pad_bench_for_bmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sigmoid_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sign_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_silu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_simplify_loops_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sin_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_single_elem_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_size_asserts_for_multi_output_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sizehint_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_mutation1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter_dtype_consistency_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_view_with_graph_break_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_softmax_backward_data_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_softmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sort_bool_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sort_transpose_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_special_polygamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_cumprod_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_cumsum_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_cumsum_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_cumsum_low_prec_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_reduction_dynamic_shape_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_reduction_with_int64_size_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_squeeze1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_squeeze2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_squeeze_varargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_std_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_strided_inputs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum_keepdims_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tensor2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tensor_index_put_slice_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tmp_not_defined_issue2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_to_memory_format_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_triu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_uint4x2_mixed_mm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_uint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unfold_zero_dimension_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unsqueeze_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_bicubic2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_nearest2d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_var_mean_div_by_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_var_mean_tile_reduction_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_var_mean_tile_reduction_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_vdd_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_vectorized_ops_masked_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_view_as_real_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_view_detach_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_adaptive_max_pool3d_with_indices_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_arithmetic_constant_folding_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_bool_mask_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_dynamic_stride_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_float_is_integer_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_float_item_neginf_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_floor_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_full_symbolic_value_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_interpolate_ceil_eq_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_bool_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_materialize_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_to_inputs_kernel_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op1_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op2_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op3_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op7_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op8_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_nonzero_no_realloc_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_nonzero_size_factory_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_pad_dynamic_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_shape_as_constant_reciprocal_float_exp_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_sort_dynamic_shape_with_check_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_sub_constant_folding_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_sym_sum_unbacked_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_cat_backwards_save_data_dependent_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_matmul_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_reduction_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_save_for_backwards_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_dynamic_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_fallback_specialization_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_fallback_symint_specialization_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_operations_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_softshrink_cuda 2025-09-07T07:47:04.3052483Z 2025-09-07T07:47:04.3052656Z Running test_autograd 1/1 ... [2025-09-07 07:47:04.238357] 2025-09-07T07:47:04.3052997Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:04.3053991Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autograd.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:04.238726] 2025-09-07T07:47:08.5918214Z 2025-09-07T07:47:08.5919136Z test_maskedtensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_maskedtensor_1.1_3fe577319814a48c_.log 2025-09-07T07:47:08.6183058Z Running 958 items in this shard: test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn0, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn1, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn10, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn11, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn12, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn13, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn14, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn15, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn16, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn17, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn18, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn19, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn2, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn20, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn21, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn22, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn23, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn24, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn25, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn26, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn27, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn28, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn29, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn3, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn30, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn31, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn32, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn33, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn34, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn35, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn36, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn37, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn38, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn39, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn4, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn40, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn41, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn42, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn43, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn44, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn45, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn46, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn47, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn48, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn49, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn5, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn50, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn51, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn52, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn53, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn54, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn55, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn56, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn57, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn6, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn7, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn8, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn9, test/test_maskedtensor.py::TestUnary::test_unary_fn0, test/test_maskedtensor.py::TestUnary::test_unary_fn1, test/test_maskedtensor.py::TestUnary::test_unary_fn10, test/test_maskedtensor.py::TestUnary::test_unary_fn11, test/test_maskedtensor.py::TestUnary::test_unary_fn12, test/test_maskedtensor.py::TestUnary::test_unary_fn13, test/test_maskedtensor.py::TestUnary::test_unary_fn14, test/test_maskedtensor.py::TestUnary::test_unary_fn15, test/test_maskedtensor.py::TestUnary::test_unary_fn16, test/test_maskedtensor.py::TestUnary::test_unary_fn17, test/test_maskedtensor.py::TestUnary::test_unary_fn18, test/test_maskedtensor.py::TestUnary::test_unary_fn19, test/test_maskedtensor.py::TestUnary::test_unary_fn2, test/test_maskedtensor.py::TestUnary::test_unary_fn20, test/test_maskedtensor.py::TestUnary::test_unary_fn21, test/test_maskedtensor.py::TestUnary::test_unary_fn22, test/test_maskedtensor.py::TestUnary::test_unary_fn23, test/test_maskedtensor.py::TestUnary::test_unary_fn24, test/test_maskedtensor.py::TestUnary::test_unary_fn25, test/test_maskedtensor.py::TestUnary::test_unary_fn26, test/test_maskedtensor.py::TestUnary::test_unary_fn27, test/test_maskedtensor.py::TestUnary::test_unary_fn28, test/test_maskedtensor.py::TestUnary::test_unary_fn29, test/test_maskedtensor.py::TestUnary::test_unary_fn3, test/test_maskedtensor.py::TestUnary::test_unary_fn30, test/test_maskedtensor.py::TestUnary::test_unary_fn31, test/test_maskedtensor.py::TestUnary::test_unary_fn32, test/test_maskedtensor.py::TestUnary::test_unary_fn33, test/test_maskedtensor.py::TestUnary::test_unary_fn34, test/test_maskedtensor.py::TestUnary::test_unary_fn35, test/test_maskedtensor.py::TestUnary::test_unary_fn36, test/test_maskedtensor.py::TestUnary::test_unary_fn37, test/test_maskedtensor.py::TestUnary::test_unary_fn38, test/test_maskedtensor.py::TestUnary::test_unary_fn39, test/test_maskedtensor.py::TestUnary::test_unary_fn4, test/test_maskedtensor.py::TestUnary::test_unary_fn40, test/test_maskedtensor.py::TestUnary::test_unary_fn41, test/test_maskedtensor.py::TestUnary::test_unary_fn42, test/test_maskedtensor.py::TestUnary::test_unary_fn43, test/test_maskedtensor.py::TestUnary::test_unary_fn44, test/test_maskedtensor.py::TestUnary::test_unary_fn45, test/test_maskedtensor.py::TestUnary::test_unary_fn46, test/test_maskedtensor.py::TestUnary::test_unary_fn47, test/test_maskedtensor.py::TestUnary::test_unary_fn48, test/test_maskedtensor.py::TestUnary::test_unary_fn49, test/test_maskedtensor.py::TestUnary::test_unary_fn5, test/test_maskedtensor.py::TestUnary::test_unary_fn50, test/test_maskedtensor.py::TestUnary::test_unary_fn51, test/test_maskedtensor.py::TestUnary::test_unary_fn52, test/test_maskedtensor.py::TestUnary::test_unary_fn53, test/test_maskedtensor.py::TestUnary::test_unary_fn54, test/test_maskedtensor.py::TestUnary::test_unary_fn55, test/test_maskedtensor.py::TestUnary::test_unary_fn56, test/test_maskedtensor.py::TestUnary::test_unary_fn57, test/test_maskedtensor.py::TestUnary::test_unary_fn58, test/test_maskedtensor.py::TestUnary::test_unary_fn59, test/test_maskedtensor.py::TestUnary::test_unary_fn6, test/test_maskedtensor.py::TestUnary::test_unary_fn60, test/test_maskedtensor.py::TestUnary::test_unary_fn61, test/test_maskedtensor.py::TestUnary::test_unary_fn7, test/test_maskedtensor.py::TestUnary::test_unary_fn8, test/test_maskedtensor.py::TestUnary::test_unary_fn9, test/test_maskedtensor.py::TestBinary::test_binary_fn0, test/test_maskedtensor.py::TestBinary::test_binary_fn1, test/test_maskedtensor.py::TestBinary::test_binary_fn10, test/test_maskedtensor.py::TestBinary::test_binary_fn11, test/test_maskedtensor.py::TestBinary::test_binary_fn12, test/test_maskedtensor.py::TestBinary::test_binary_fn13, test/test_maskedtensor.py::TestBinary::test_binary_fn14, test/test_maskedtensor.py::TestBinary::test_binary_fn15, test/test_maskedtensor.py::TestBinary::test_binary_fn16, test/test_maskedtensor.py::TestBinary::test_binary_fn17, test/test_maskedtensor.py::TestBinary::test_binary_fn18, test/test_maskedtensor.py::TestBinary::test_binary_fn19, test/test_maskedtensor.py::TestBinary::test_binary_fn2, test/test_maskedtensor.py::TestBinary::test_binary_fn20, test/test_maskedtensor.py::TestBinary::test_binary_fn21, test/test_maskedtensor.py::TestBinary::test_binary_fn22, test/test_maskedtensor.py::TestBinary::test_binary_fn23, test/test_maskedtensor.py::TestBinary::test_binary_fn24, test/test_maskedtensor.py::TestBinary::test_binary_fn25, test/test_maskedtensor.py::TestBinary::test_binary_fn26, test/test_maskedtensor.py::TestBinary::test_binary_fn27, test/test_maskedtensor.py::TestBinary::test_binary_fn28, test/test_maskedtensor.py::TestBinary::test_binary_fn29, test/test_maskedtensor.py::TestBinary::test_binary_fn3, test/test_maskedtensor.py::TestBinary::test_binary_fn30, test/test_maskedtensor.py::TestBinary::test_binary_fn31, test/test_maskedtensor.py::TestBinary::test_binary_fn32, test/test_maskedtensor.py::TestBinary::test_binary_fn33, test/test_maskedtensor.py::TestBinary::test_binary_fn34, test/test_maskedtensor.py::TestBinary::test_binary_fn35, test/test_maskedtensor.py::TestBinary::test_binary_fn4, test/test_maskedtensor.py::TestBinary::test_binary_fn5, test/test_maskedtensor.py::TestBinary::test_binary_fn6, test/test_maskedtensor.py::TestBinary::test_binary_fn7, test/test_maskedtensor.py::TestBinary::test_binary_fn8, test/test_maskedtensor.py::TestBinary::test_binary_fn9, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn0, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn1, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn10, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn11, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn12, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn13, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn14, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn15, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn16, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn17, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn18, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn19, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn2, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn20, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn21, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn22, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn23, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn24, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn25, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn26, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn27, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn28, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn29, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn3, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn4, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn5, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn6, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn7, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn8, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn9, test/test_maskedtensor.py::TestBinary::test_masks_match_fn_name_add, test/test_maskedtensor.py::TestBinary::test_masks_match_fn_name_add_, test/test_maskedtensor.py::TestReductions::test__is_any_true, test/test_maskedtensor.py::TestReductions::test__is_any_true_false, test/test_maskedtensor.py::TestReductions::test_all, test/test_maskedtensor.py::TestReductions::test_amax, test/test_maskedtensor.py::TestReductions::test_amax_grad, test/test_maskedtensor.py::TestReductions::test_amin, test/test_maskedtensor.py::TestReductions::test_amin_grad, test/test_maskedtensor.py::TestReductions::test_any_true_dtype, test/test_maskedtensor.py::TestReductions::test_backward, test/test_maskedtensor.py::TestReductions::test_grad_dtype, test/test_maskedtensor.py::TestReductions::test_max_not_implemented, test/test_maskedtensor.py::TestReductions::test_mean, test/test_maskedtensor.py::TestReductions::test_mean_dim_grad, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1a, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1b, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1c, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1d, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1e, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1f, test/test_maskedtensor.py::TestReductions::test_prod, test/test_maskedtensor.py::TestReductions::test_prod_grad, test/test_maskedtensor.py::TestReductions::test_sum, test/test_maskedtensor.py::TestReductions::test_sum_grad, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout2_cuda_float64, test/test_maskedtensor.py::TestBasicsCUDA::test_add_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_contiguous_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_diff_dim_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_diff_layouts_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_diff_sizes_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_grad_warning_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_sparse_coo_values_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_sparse_csr_values_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_sparse_layout_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_tensor_inputs_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_nn_unfold_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_softmax_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_stack_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dense_and_sparse_coo_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dense_and_sparse_csr_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dense_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_device_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dtype_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_sparse_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_unfold_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_where_cuda 2025-09-07T07:47:08.6433377Z 2025-09-07T07:47:08.6433574Z Running dynamo/test_reorder_logs 1/1 ... [2025-09-07 07:47:08.593207] 2025-09-07T07:47:08.6433951Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:08.6434859Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_reorder_logs.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:08.593560] 2025-09-07T07:47:10.8090208Z 2025-09-07T07:47:10.8091702Z test_decomp 18/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_18.22_556ab6b68dc253d0_.log 2025-09-07T07:47:10.8218293Z Running 436 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bernoulli_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_or_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hypot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hypot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_igammac_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_inner_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_power_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_triangular_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_unpack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanquantile_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_linear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_linear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_head_attention_forward_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_prelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_cosine_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick__native_batch_norm_legit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick__native_batch_norm_legit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_addmv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_and_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_right_shift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_deg2rad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_logsumexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_std_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_xlogy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_igammac_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_native_batch_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_native_batch_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_gelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_polar_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_round_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_uniform_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_GRU_eval_mode_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_RNN_train_mode_cuda_float32, test/test_decomp.py::DecompOneOffTestsCUDA::test_threshold_backward_dtype_cuda 2025-09-07T07:47:10.8338955Z 2025-09-07T07:47:10.8339198Z Running dynamo/test_exceptions 1/1 ... [2025-09-07 07:47:10.809557] 2025-09-07T07:47:10.8339621Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:10.8340640Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_exceptions.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:10.809914] 2025-09-07T07:47:12.3634652Z 2025-09-07T07:47:12.3635708Z dynamo/test_reorder_logs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_reorder_logs_1.1_3b7523d6a996ce74_.log 2025-09-07T07:47:12.3643069Z Running 14 items in this shard: test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method0_fn0_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method1_fn1_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method2_fn2_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method3_fn3_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method4_fn4_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method5_fn5_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method6_fn6_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method7_fn7_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_constant_mutation, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_dont_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_custom_log_fn, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print_graph_break, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_warnings 2025-09-07T07:47:12.3649894Z 2025-09-07T07:47:12.3650296Z Running export/test_lift_unlift 1/1 ... [2025-09-07 07:47:12.363496] 2025-09-07T07:47:12.3650732Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:12.3651777Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_lift_unlift.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:12.363863] 2025-09-07T07:47:14.9801929Z 2025-09-07T07:47:14.9802893Z dynamo/test_exceptions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_exceptions_1.1_b5f9a007b69d4515_.log 2025-09-07T07:47:14.9820385Z Running 48 items in this shard: test/dynamo/test_exceptions.py::ExceptionTests::test_atrribute_error, test/dynamo/test_exceptions.py::ExceptionTests::test_attribute_error_from_getattr, test/dynamo/test_exceptions.py::ExceptionTests::test_autocast_with_exception, test/dynamo/test_exceptions.py::ExceptionTests::test_block_stack_cleanup, test/dynamo/test_exceptions.py::ExceptionTests::test_custom_getattr_on_module_exception, test/dynamo/test_exceptions.py::ExceptionTests::test_dict_pop, test/dynamo/test_exceptions.py::ExceptionTests::test_dynamo_undo_kw_names, test/dynamo/test_exceptions.py::ExceptionTests::test_ensure_exception_is_active_after_try_except_block, test/dynamo/test_exceptions.py::ExceptionTests::test_ensure_exception_is_active_inside_try_except_block, test/dynamo/test_exceptions.py::ExceptionTests::test_exception, test/dynamo/test_exceptions.py::ExceptionTests::test_exception2, test/dynamo/test_exceptions.py::ExceptionTests::test_exception3, test/dynamo/test_exceptions.py::ExceptionTests::test_exception4, test/dynamo/test_exceptions.py::ExceptionTests::test_exception_else, test/dynamo/test_exceptions.py::ExceptionTests::test_exception_raised_from_child, test/dynamo/test_exceptions.py::ExceptionTests::test_exception_with_another_exception, test/dynamo/test_exceptions.py::ExceptionTests::test_exception_with_another_exception2, test/dynamo/test_exceptions.py::ExceptionTests::test_exception_with_ctx_manager, test/dynamo/test_exceptions.py::ExceptionTests::test_exception_with_vars, test/dynamo/test_exceptions.py::ExceptionTests::test_handle_all_exceptions, test/dynamo/test_exceptions.py::ExceptionTests::test_isinstance_CustomException, test/dynamo/test_exceptions.py::ExceptionTests::test_key_error, test/dynamo/test_exceptions.py::ExceptionTests::test_nn_module_getattr, test/dynamo/test_exceptions.py::ExceptionTests::test_nn_reraise, test/dynamo/test_exceptions.py::ExceptionTests::test_propagate_exception_inside_ctx_manager, test/dynamo/test_exceptions.py::ExceptionTests::test_raise_GeneratorExit, test/dynamo/test_exceptions.py::ExceptionTests::test_raise_custom_exception, test/dynamo/test_exceptions.py::ExceptionTests::test_raise_custom_exception_with_args, test/dynamo/test_exceptions.py::ExceptionTests::test_raise_finally_simple, test/dynamo/test_exceptions.py::ExceptionTests::test_raise_from_None, test/dynamo/test_exceptions.py::ExceptionTests::test_raise_from_None_2, test/dynamo/test_exceptions.py::ExceptionTests::test_raise_from_other, test/dynamo/test_exceptions.py::ExceptionTests::test_raise_match, test/dynamo/test_exceptions.py::ExceptionTests::test_raise_set___context__, test/dynamo/test_exceptions.py::ExceptionTests::test_reconstruct___context__, test/dynamo/test_exceptions.py::ExceptionTests::test_reconstruct_exception_2, test/dynamo/test_exceptions.py::ExceptionTests::test_reraise, test/dynamo/test_exceptions.py::ExceptionTests::test_reraise_first_exc, test/dynamo/test_exceptions.py::ExceptionTests::test_set___cause___CustomException, test/dynamo/test_exceptions.py::ExceptionTests::test_set___cause___TypeError, test/dynamo/test_exceptions.py::ExceptionTests::test_set___cause___error_CustomException, test/dynamo/test_exceptions.py::ExceptionTests::test_set___cause___error_RuntimeError, test/dynamo/test_exceptions.py::ExceptionTests::test_set_cause_with_arg, test/dynamo/test_exceptions.py::ExceptionTests::test_set_cause_with_arg_error, test/dynamo/test_exceptions.py::ExceptionTests::test_speculation_exception, test/dynamo/test_exceptions.py::ExceptionTests::test_stop_iteration, test/dynamo/test_exceptions.py::ExceptionTests::test_user_defined_exception_variable, test/dynamo/test_exceptions.py::ExceptionTests::test_user_defined_exception_with_args 2025-09-07T07:47:14.9832909Z 2025-09-07T07:47:14.9833096Z Running test_public_bindings 1/1 ... [2025-09-07 07:47:14.980240] 2025-09-07T07:47:14.9833468Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:14.9834344Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_public_bindings.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:14.980644] 2025-09-07T07:47:16.6344008Z 2025-09-07T07:47:16.6345156Z export/test_lift_unlift 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_lift_unlift_1.1_6eb6518b2b608865_.log 2025-09-07T07:47:16.6348217Z Running 5 items in this shard: test/export/test_lift_unlift.py::TestLift::test_duplicate_constant_access, test/export/test_lift_unlift.py::TestLift::test_lift_basic, test/export/test_lift_unlift.py::TestLift::test_lift_nested, test/export/test_lift_unlift.py::TestLift::test_unlift_nonpersistent_buffer, test/export/test_lift_unlift.py::ConstantAttrMapTest::test_dict_api 2025-09-07T07:47:16.6350052Z 2025-09-07T07:47:16.6351881Z Running dynamo/test_exc 1/1 ... [2025-09-07 07:47:16.634569] 2025-09-07T07:47:16.6352336Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:16.6353529Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_exc.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:16.634958] 2025-09-07T07:47:16.7700441Z 2025-09-07T07:47:16.7701013Z test_autograd 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autograd_1.1_c9a9901586b4f767_.log 2025-09-07T07:47:16.7874793Z Running 651 items in this shard: test/test_autograd.py::TestAutograd::test_access_saved_tensor_twice_without_recomputation_works, test/test_autograd.py::TestAutograd::test_accumulate_grad, test/test_autograd.py::TestAutograd::test_accumulate_grad_posthooks_can_observe_tensor_prehook, test/test_autograd.py::TestAutograd::test_accumulate_grad_posthooks_should_not_execute, test/test_autograd.py::TestAutograd::test_accumulate_grad_tensor_reference, test/test_autograd.py::TestAutograd::test_accumulate_grad_with_zero_numel_grad, test/test_autograd.py::TestAutograd::test_anomaly_assign_parent_cleanup, test/test_autograd.py::TestAutograd::test_anomaly_detect_nan, test/test_autograd.py::TestAutograd::test_anomaly_grad_warnings, test/test_autograd.py::TestAutograd::test_anomaly_mode_no_check_nan, test/test_autograd.py::TestAutograd::test_attribute_deletion, test/test_autograd.py::TestAutograd::test_autograd_inplace_view_of_view, test/test_autograd.py::TestAutograd::test_autograd_inplace_views_creation_meta, test/test_autograd.py::TestAutograd::test_autograd_inplace_views_cross_dtype, test/test_autograd.py::TestAutograd::test_autograd_multiple_views_python, test/test_autograd.py::TestAutograd::test_autograd_node_isinstance, test/test_autograd.py::TestAutograd::test_autograd_print_tensor, test/test_autograd.py::TestAutograd::test_autograd_python_custom_function_inplace, test/test_autograd.py::TestAutograd::test_autograd_simple_views_python, test/test_autograd.py::TestAutograd::test_autograd_views_codegen, test/test_autograd.py::TestAutograd::test_backward, test/test_autograd.py::TestAutograd::test_backward_badcalls, test/test_autograd.py::TestAutograd::test_backward_copy, test/test_autograd.py::TestAutograd::test_backward_create_graph_warns, test/test_autograd.py::TestAutograd::test_backward_hook_relative_ordering, test/test_autograd.py::TestAutograd::test_backward_no_grad, test/test_autograd.py::TestAutograd::test_backward_to_node, test/test_autograd.py::TestAutograd::test_backward_twice_retained_graph_with_saved_values, test/test_autograd.py::TestAutograd::test_backward_twice_retained_graph_without_saved_values, test/test_autograd.py::TestAutograd::test_backward_twice_with_saved_values, test/test_autograd.py::TestAutograd::test_backward_twice_without_saved_values, test/test_autograd.py::TestAutograd::test_backward_with_inputs, test/test_autograd.py::TestAutograd::test_backward_with_nonleaf_inputs, test/test_autograd.py::TestAutograd::test_backward_with_scalar_input, test/test_autograd.py::TestAutograd::test_calculate_shape_util, test/test_autograd.py::TestAutograd::test_callback_adds_callback, test/test_autograd.py::TestAutograd::test_callback_propagates_errors_from_device_thread, test/test_autograd.py::TestAutograd::test_cant_create_saved_tensors, test/test_autograd.py::TestAutograd::test_checkpoint_detects_non_determinism, test/test_autograd.py::TestAutograd::test_checkpoint_sequential_warns_if_use_reentrant_not_passed_explcitly, test/test_autograd.py::TestAutograd::test_checkpoint_valid_reset_on_error, test/test_autograd.py::TestAutograd::test_checkpoint_warns_if_use_reentrant_not_passed_explcitly, test/test_autograd.py::TestAutograd::test_checkpointing, test/test_autograd.py::TestAutograd::test_checkpointing_non_reentrant_autocast_cpu, test/test_autograd.py::TestAutograd::test_checkpointing_non_reentrant_autocast_gpu, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_arbitrary_input_output, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_correct_grad, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_custom_function_works, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_dataparallel, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_False, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_True, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_input_requires_grad_False, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_input_requires_grad_True, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_memory_savings, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_parameter_used_in_an_out, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_saved_object_identity, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_with_context_fn, test/test_autograd.py::TestAutograd::test_copy_slices_graph_task_updates, test/test_autograd.py::TestAutograd::test_create_graph_and_full_backward_hook_cycle, test/test_autograd.py::TestAutograd::test_current_graph_task_execution_order, test/test_autograd.py::TestAutograd::test_current_graph_task_id, test/test_autograd.py::TestAutograd::test_current_node, test/test_autograd.py::TestAutograd::test_custom_autograd_ac_early_stop, test/test_autograd.py::TestAutograd::test_custom_autograd_no_early_free, test/test_autograd.py::TestAutograd::test_custom_autograd_repeated_grad_grad, test/test_autograd.py::TestAutograd::test_custom_function_cycle, test/test_autograd.py::TestAutograd::test_custom_function_error, test/test_autograd.py::TestAutograd::test_custom_function_exception, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_forward_is_no_op, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_inplace_checks, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_non_differentiable, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_non_tensor_before_tensor_args, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_view_checks, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_wrong_formula, test/test_autograd.py::TestAutograd::test_custom_function_inplace_on_non_default_view, test/test_autograd.py::TestAutograd::test_custom_function_inplace_on_view_of_leaf, test/test_autograd.py::TestAutograd::test_custom_function_local_inplace, test/test_autograd.py::TestAutograd::test_custom_function_mark_dirty_not_differentiable, test/test_autograd.py::TestAutograd::test_custom_function_mark_output_view_of_intermediate, test/test_autograd.py::TestAutograd::test_custom_function_no_tensors, test/test_autograd.py::TestAutograd::test_custom_function_non_tensor_inputs_outputs, test/test_autograd.py::TestAutograd::test_custom_function_preserve_torch_function_when_return_as_is, test/test_autograd.py::TestAutograd::test_custom_function_return_view_in_nograd, test/test_autograd.py::TestAutograd::test_custom_function_save_for_forward, test/test_autograd.py::TestAutograd::test_custom_function_saved_tensors, test/test_autograd.py::TestAutograd::test_custom_function_setup_context_multi_input, test/test_autograd.py::TestAutograd::test_custom_function_setup_context_multi_output, test/test_autograd.py::TestAutograd::test_custom_function_setup_context_simple, test/test_autograd.py::TestAutograd::test_custom_function_vmap_defaults, test/test_autograd.py::TestAutograd::test_deep_reentrant, test/test_autograd.py::TestAutograd::test_default_saved_tensors_hooks_double_backward, test/test_autograd.py::TestAutograd::test_dep_nograd, test/test_autograd.py::TestAutograd::test_dependent_backward, test/test_autograd.py::TestAutograd::test_detach, test/test_autograd.py::TestAutograd::test_detach_base, test/test_autograd.py::TestAutograd::test_detach_then_inplace_raises_in_autograd, test/test_autograd.py::TestAutograd::test_diagonal_expanded_v, test/test_autograd.py::TestAutograd::test_dir, test/test_autograd.py::TestAutograd::test_disabling_saved_tensor_hooks, test/test_autograd.py::TestAutograd::test_disabling_saved_tensor_hooks_nested, test/test_autograd.py::TestAutograd::test_dont_materialize_grads, test/test_autograd.py::TestAutograd::test_duplicate_backward_root, test/test_autograd.py::TestAutograd::test_enable_grad_decorator_no_paren, test/test_autograd.py::TestAutograd::test_first_grad_fn_access_in_no_grad_mode, test/test_autograd.py::TestAutograd::test_free_deep_graph, test/test_autograd.py::TestAutograd::test_free_deep_graph_complicated, test/test_autograd.py::TestAutograd::test_free_deep_graph_pyfunction, test/test_autograd.py::TestAutograd::test_full_backward_hook_double_backward, test/test_autograd.py::TestAutograd::test_function, test/test_autograd.py::TestAutograd::test_function_returns_input, test/test_autograd.py::TestAutograd::test_function_returns_undefined_tensor, test/test_autograd.py::TestAutograd::test_gc_in_destructor, test/test_autograd.py::TestAutograd::test_grad, test/test_autograd.py::TestAutograd::test_grad_badcalls, test/test_autograd.py::TestAutograd::test_grad_batched_grad, test/test_autograd.py::TestAutograd::test_grad_empty_inputs, test/test_autograd.py::TestAutograd::test_grad_fn_attr_bindings, test/test_autograd.py::TestAutograd::test_grad_fn_badcalls, test/test_autograd.py::TestAutograd::test_grad_fn_input_metadata, test/test_autograd.py::TestAutograd::test_grad_fn_prehooks, test/test_autograd.py::TestAutograd::test_grad_fn_prehooks_multiple_outputs, test/test_autograd.py::TestAutograd::test_grad_fn_prehooks_remove_hooks, test/test_autograd.py::TestAutograd::test_grad_materialize_grads, test/test_autograd.py::TestAutograd::test_grad_mode_class_decoration, test/test_autograd.py::TestAutograd::test_grad_mode_restored_reentrant, test/test_autograd.py::TestAutograd::test_grad_nonleaf, test/test_autograd.py::TestAutograd::test_grad_nonleaf_many_outputs, test/test_autograd.py::TestAutograd::test_grad_nonleaf_register_hook, test/test_autograd.py::TestAutograd::test_grad_to_node, test/test_autograd.py::TestAutograd::test_grad_to_node_inplace, test/test_autograd.py::TestAutograd::test_grad_to_node_materialize, test/test_autograd.py::TestAutograd::test_grad_to_node_multi, test/test_autograd.py::TestAutograd::test_grad_to_node_set, test/test_autograd.py::TestAutograd::test_grad_unreachable, test/test_autograd.py::TestAutograd::test_grad_unreachable_discovery, test/test_autograd.py::TestAutograd::test_gradcheck_backward_mul_by_grad_output, test/test_autograd.py::TestAutograd::test_gradcheck_check_batched_grad, test/test_autograd.py::TestAutograd::test_gradcheck_check_forward_or_backward_only, test/test_autograd.py::TestAutograd::test_gradcheck_check_no_differentiable_outputs, test/test_autograd.py::TestAutograd::test_gradcheck_complex_non_complex_outputs, test/test_autograd.py::TestAutograd::test_gradcheck_custom_error, test/test_autograd.py::TestAutograd::test_gradcheck_default_device_placement_context, test/test_autograd.py::TestAutograd::test_gradcheck_dense_and_sparse_inputs, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad_batched_grad, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad_respects_requires_grad, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad_runs_with_no_requires_grad, test/test_autograd.py::TestAutograd::test_gradcheck_get_analytical_jacobian, test/test_autograd.py::TestAutograd::test_gradcheck_get_numerical_jacobian, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout0, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout1, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout2, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout3, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout4, test/test_autograd.py::TestAutograd::test_gradcheck_jacobian_mismatch, test/test_autograd.py::TestAutograd::test_gradcheck_multiple_mkldnn_inputs, test/test_autograd.py::TestAutograd::test_gradcheck_nondeterministic, test/test_autograd.py::TestAutograd::test_gradcheck_output_shape_or_dtype_depend_on_values, test/test_autograd.py::TestAutograd::test_gradcheck_single_input, test/test_autograd.py::TestAutograd::test_gradcheck_test_outputs, test/test_autograd.py::TestAutograd::test_gradcheck_undefined_grad, test/test_autograd.py::TestAutograd::test_gradcheck_validates_input_mkldnn, test/test_autograd.py::TestAutograd::test_gradcheck_validates_inputs, test/test_autograd.py::TestAutograd::test_gradient_edge_graph_ownership, test/test_autograd.py::TestAutograd::test_gradient_edge_output, test/test_autograd.py::TestAutograd::test_graph_save_on_cpu, test/test_autograd.py::TestAutograd::test_graph_save_on_cpu_cuda, test/test_autograd.py::TestAutograd::test_hessian_vector, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_False_use_tensor_hook_False, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_False_use_tensor_hook_True, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_True_use_tensor_hook_False, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_True_use_tensor_hook_True, test/test_autograd.py::TestAutograd::test_hook_edge_case_when_called_with_grad, test/test_autograd.py::TestAutograd::test_hook_none, test/test_autograd.py::TestAutograd::test_hook_with_no_name, test/test_autograd.py::TestAutograd::test_hooks, test/test_autograd.py::TestAutograd::test_hooks_cpp, test/test_autograd.py::TestAutograd::test_increment_version, test/test_autograd.py::TestAutograd::test_index_backward_does_not_save_tensor, test/test_autograd.py::TestAutograd::test_indexing, test/test_autograd.py::TestAutograd::test_indexing_duplicates, test/test_autograd.py::TestAutograd::test_inplace, test/test_autograd.py::TestAutograd::test_inplace_not_requires_grad, test/test_autograd.py::TestAutograd::test_inplace_on_view_backward, test/test_autograd.py::TestAutograd::test_inplace_on_view_leaf_errors, test/test_autograd.py::TestAutograd::test_inplace_on_view_saved_output, test/test_autograd.py::TestAutograd::test_inplace_on_view_weak_grad_fn, test/test_autograd.py::TestAutograd::test_input_buffer_accum, test/test_autograd.py::TestAutograd::test_integer_outputs, test/test_autograd.py::TestAutograd::test_invalid_gradients, test/test_autograd.py::TestAutograd::test_isolated_node, test/test_autograd.py::TestAutograd::test_leaf_assignment, test/test_autograd.py::TestAutograd::test_legacy_function_deprecation_exception, test/test_autograd.py::TestAutograd::test_lobpcg, test/test_autograd.py::TestAutograd::test_mark_non_differentiable, test/test_autograd.py::TestAutograd::test_mark_non_differentiable_mixed, test/test_autograd.py::TestAutograd::test_mark_non_differentiable_none, test/test_autograd.py::TestAutograd::test_materialize_grads, test/test_autograd.py::TestAutograd::test_multi_backward, test/test_autograd.py::TestAutograd::test_multi_backward_no_grad, test/test_autograd.py::TestAutograd::test_multi_grad_all_hooks, test/test_autograd.py::TestAutograd::test_multi_grad_any_hooks, test/test_autograd.py::TestAutograd::test_multi_grad_hooks_invalid_mode, test/test_autograd.py::TestAutograd::test_multiple_insert_removal_caching, test/test_autograd.py::TestAutograd::test_named_tensor_for_complex_views, test/test_autograd.py::TestAutograd::test_naughty_anomaly_access, test/test_autograd.py::TestAutograd::test_naughty_autograd_function_attribute_access, test/test_autograd.py::TestAutograd::test_naughty_autograd_function_stashing_ctx, test/test_autograd.py::TestAutograd::test_nested_anomaly_detect_nan, test/test_autograd.py::TestAutograd::test_nested_anomaly_printstack_cleanup, test/test_autograd.py::TestAutograd::test_next_functions, test/test_autograd.py::TestAutograd::test_no_grad, test/test_autograd.py::TestAutograd::test_no_grad_assignment, test/test_autograd.py::TestAutograd::test_no_grad_copy, test/test_autograd.py::TestAutograd::test_no_grad_copy_sparse, test/test_autograd.py::TestAutograd::test_no_grad_input, test/test_autograd.py::TestAutograd::test_no_grad_modifies_version, test/test_autograd.py::TestAutograd::test_no_grad_python_function, test/test_autograd.py::TestAutograd::test_no_requires_grad_inplace, test/test_autograd.py::TestAutograd::test_no_unnecessary_save, test/test_autograd.py::TestAutograd::test_no_unnecessary_unwrapping, test/test_autograd.py::TestAutograd::test_node_ordering_when_none_returned, test/test_autograd.py::TestAutograd::test_node_post_hook_registered_during_unpack_hook, test/test_autograd.py::TestAutograd::test_not_implemented_fwad, test/test_autograd.py::TestAutograd::test_not_implemented_grad, test/test_autograd.py::TestAutograd::test_numpy_requires_grad, test/test_autograd.py::TestAutograd::test_once_differentiable, test/test_autograd.py::TestAutograd::test_out_variant_raises_when_inputs_require_grad, test/test_autograd.py::TestAutograd::test_pack_hook_with_inplace_modification_should_fail, test/test_autograd.py::TestAutograd::test_pickle, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_e2e, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_gets_cleaned_up, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_multiple_hooks, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_multiple_tensors, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_on_non_leaf, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_ordering, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_returns_not_None, test/test_autograd.py::TestAutograd::test_pow_zero_tensor_gradient, test/test_autograd.py::TestAutograd::test_power_function, test/test_autograd.py::TestAutograd::test_prehook_ordering, test/test_autograd.py::TestAutograd::test_profiler, test/test_autograd.py::TestAutograd::test_profiler_aggregation_fake, test/test_autograd.py::TestAutograd::test_profiler_aggregation_lstm, test/test_autograd.py::TestAutograd::test_profiler_aggregation_table, test/test_autograd.py::TestAutograd::test_profiler_function_event_avg, test/test_autograd.py::TestAutograd::test_profiler_propagation, test/test_autograd.py::TestAutograd::test_profiler_seq_nr, test/test_autograd.py::TestAutograd::test_profiler_shapes, test/test_autograd.py::TestAutograd::test_profiler_unboxed_only, test/test_autograd.py::TestAutograd::test_pynode_destruction_deadlock, test/test_autograd.py::TestAutograd::test_record_function, test/test_autograd.py::TestAutograd::test_record_function_callbacks, test/test_autograd.py::TestAutograd::test_record_function_legacy, test/test_autograd.py::TestAutograd::test_record_function_multithreaded, test/test_autograd.py::TestAutograd::test_reentrant_child_error, test/test_autograd.py::TestAutograd::test_reentrant_priority, test/test_autograd.py::TestAutograd::test_reentrant_with_callbacks_both_depths, test/test_autograd.py::TestAutograd::test_reentrant_with_callbacks_depth_0, test/test_autograd.py::TestAutograd::test_reentrant_with_callbacks_depth_1, test/test_autograd.py::TestAutograd::test_reentrant_with_leaf_variable_hook, test/test_autograd.py::TestAutograd::test_reentrant_with_non_leaf_variable_hook, test/test_autograd.py::TestAutograd::test_requires_grad, test/test_autograd.py::TestAutograd::test_requires_grad_, test/test_autograd.py::TestAutograd::test_requires_grad_inplace, test/test_autograd.py::TestAutograd::test_retain_grad, test/test_autograd.py::TestAutograd::test_retain_grad_cycle, test/test_autograd.py::TestAutograd::test_retain_grad_inplace, test/test_autograd.py::TestAutograd::test_retain_grad_inplace_over_view, test/test_autograd.py::TestAutograd::test_retains_grad_can_always_observe_tensor_prehook, test/test_autograd.py::TestAutograd::test_retains_grad_inplace_multiple_outputs, test/test_autograd.py::TestAutograd::test_return_duplicate, test/test_autograd.py::TestAutograd::test_return_duplicate_inplace, test/test_autograd.py::TestAutograd::test_return_leaf, test/test_autograd.py::TestAutograd::test_return_leaf_inplace, test/test_autograd.py::TestAutograd::test_save_none_for_backward, test/test_autograd.py::TestAutograd::test_save_on_cpu_and_checkpoint, test/test_autograd.py::TestAutograd::test_save_output_nr, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_custom_error_propagation, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_custom_function_intermediates, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_extra_enter_during_bw_no_leak, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_extra_exit_during_bw_no_crash, test/test_autograd.py::TestAutograd::test_saved_tensors_hook_version_counter_not_shared, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_default_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_saved_original_with_default_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_saved_original_with_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_saved_original_inplace_detach, test/test_autograd.py::TestAutograd::test_saved_variable_version_counter, test/test_autograd.py::TestAutograd::test_saved_variables_deprecated, test/test_autograd.py::TestAutograd::test_saving_variable_to_disk, test/test_autograd.py::TestAutograd::test_scalar_grad_mixed_device, test/test_autograd.py::TestAutograd::test_select_expanded_v, test/test_autograd.py::TestAutograd::test_select_sum, test/test_autograd.py::TestAutograd::test_set_data_preserve_pyobj, test/test_autograd.py::TestAutograd::test_set_data_self_requires_grad, test/test_autograd.py::TestAutograd::test_set_data_tensorimpl_type, test/test_autograd.py::TestAutograd::test_set_grad_coroutines, test/test_autograd.py::TestAutograd::test_set_grad_coroutines_benign_exceptions, test/test_autograd.py::TestAutograd::test_set_grad_coroutines_critical_exceptions, test/test_autograd.py::TestAutograd::test_set_grad_coroutines_exit, test/test_autograd.py::TestAutograd::test_set_grad_enabled, test/test_autograd.py::TestAutograd::test_set_grad_enabled_wraps, test/test_autograd.py::TestAutograd::test_set_grad_generator_functions, test/test_autograd.py::TestAutograd::test_set_grad_generator_functions_recursive, test/test_autograd.py::TestAutograd::test_set_materialize_non_diff_grads, test/test_autograd.py::TestAutograd::test_setitem, test/test_autograd.py::TestAutograd::test_setitem_mask, test/test_autograd.py::TestAutograd::test_setting_default_saved_variable_hooks_twice_should_not_fail, test/test_autograd.py::TestAutograd::test_setting_default_saved_variable_hooks_twice_should_use_inner, test/test_autograd.py::TestAutograd::test_setup_context_when_forward_has_default_args, test/test_autograd.py::TestAutograd::test_shape, test/test_autograd.py::TestAutograd::test_sharded_grad, test/test_autograd.py::TestAutograd::test_simple_reentrant, test/test_autograd.py::TestAutograd::test_slice_expanded_v, test/test_autograd.py::TestAutograd::test_sparse_gather_both_scalar, test/test_autograd.py::TestAutograd::test_sparse_gather_dim0, test/test_autograd.py::TestAutograd::test_sparse_gather_dim1, test/test_autograd.py::TestAutograd::test_sparse_gather_dim_neg, test/test_autograd.py::TestAutograd::test_sparse_gather_ind_scalar, test/test_autograd.py::TestAutograd::test_sparse_gather_x_scalar, test/test_autograd.py::TestAutograd::test_sparse_mm_backward, test/test_autograd.py::TestAutograd::test_tensor_grad_warnings, test/test_autograd.py::TestAutograd::test_tensor_hooks_inplace, test/test_autograd.py::TestAutograd::test_tensor_hooks_inplace_multiple_outputs, test/test_autograd.py::TestAutograd::test_tensor_hooks_inplace_over_view, test/test_autograd.py::TestAutograd::test_thread_shutdown, test/test_autograd.py::TestAutograd::test_to_sparse_backward, test/test_autograd.py::TestAutograd::test_too_many_grads, test/test_autograd.py::TestAutograd::test_type_conversions, test/test_autograd.py::TestAutograd::test_unpack_hooks_exec_count, test/test_autograd.py::TestAutograd::test_unrelated_inputs, test/test_autograd.py::TestAutograd::test_unsafe_set_version_counter, test/test_autograd.py::TestAutograd::test_unused_output, test/test_autograd.py::TestAutograd::test_var_mean_differentiable, test/test_autograd.py::TestAutograd::test_variable_traverse, test/test_autograd.py::TestAutograd::test_version_counter, test/test_autograd.py::TestAutograd::test_view_func_replay, test/test_autograd.py::TestAutograd::test_view_func_replay_with_modified_state, test/test_autograd.py::TestAutograd::test_view_replay_enabled, test/test_autograd.py::TestAutograd::test_volatile_deprecated, test/test_autograd.py::TestAutograd::test_will_engine_execute_node, test/test_autograd.py::TestAutograd::test_wrapped_number_saved_tensors_hooks, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_inplace_on_view_not_same_layout, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_inplace_on_view_same_layout, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_metadata_check_for_storage_numel_skipped, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_out_of_place_basic, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_out_of_place_not_same_layout, test/test_autograd.py::TestAutogradForwardMode::test_advanced_packing_unpacking, test/test_autograd.py::TestAutogradForwardMode::test_backward_graph_destruction, test/test_autograd.py::TestAutogradForwardMode::test_basic_packing_unpacking, test/test_autograd.py::TestAutogradForwardMode::test_codegen_ignores_undefined_outputs, test/test_autograd.py::TestAutogradForwardMode::test_create_new_zeros_with_same_meta, test/test_autograd.py::TestAutogradForwardMode::test_default_level, test/test_autograd.py::TestAutogradForwardMode::test_detach_view_tracking, test/test_autograd.py::TestAutogradForwardMode::test_forward_level_cleanup, test/test_autograd.py::TestAutogradForwardMode::test_fwd_grad_enabled, test/test_autograd.py::TestAutogradForwardMode::test_grad_cleanup, test/test_autograd.py::TestAutogradForwardMode::test_make_dual_forbid_integral_dtype, test/test_autograd.py::TestAutogradForwardMode::test_make_dual_inference_tensor_in_inference_mode, test/test_autograd.py::TestAutogradForwardMode::test_make_dual_torch_dispatch, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_check_conj, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_checks_ignores_size_zero, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_checks_storage_numel, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_ignore_storage_offset_for_zero_numel_tensor, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_when_primal_has_conj_bit, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_when_primal_has_neg_bit, test/test_autograd.py::TestAutogradForwardMode::test_nested_level, test/test_autograd.py::TestAutogradForwardMode::test_non_differentiable, test/test_autograd.py::TestAutogradForwardMode::test_out_variant, test/test_autograd.py::TestAutogradForwardMode::test_print, test/test_autograd.py::TestAutogradForwardMode::test_set_fw_grad_having_own_fw_grad_at_same_level, test/test_autograd.py::TestAutogradForwardMode::test_set_fwd_grad_enabled, test/test_autograd.py::TestAutogradForwardMode::test_size_check, test/test_autograd.py::TestAutogradForwardMode::test_view_inplace_always_creates_a_view, test/test_autograd.py::TestAutogradForwardMode::test_view_inplace_differentiable_views, test/test_autograd.py::TestAutogradForwardMode::test_view_inplace_non_differentiable_views, test/test_autograd.py::TestAllowMutationOnSaved::test_backward_out_of_context, test/test_autograd.py::TestAllowMutationOnSaved::test_basic, test/test_autograd.py::TestAllowMutationOnSaved::test_disallow_nesting, test/test_autograd.py::TestAllowMutationOnSaved::test_double_backward, test/test_autograd.py::TestAllowMutationOnSaved::test_inplace_foreach, test/test_autograd.py::TestAllowMutationOnSaved::test_save_base_and_modify_view, test/test_autograd.py::TestAllowMutationOnSaved::test_save_view_modify_base, test/test_autograd.py::TestAllowMutationOnSaved::test_saved_but_not_anymore, test/test_autograd.py::TestAllowMutationOnSaved::test_saved_same_tensor_different_versions, test/test_autograd.py::TestAllowMutationOnSaved::test_saved_same_tensor_many_times, test/test_autograd.py::TestAllowMutationOnSaved::test_views, test/test_autograd.py::TestAllowMutationOnSaved::test_with_math_views, test/test_autograd.py::TestAllowMutationOnSaved::test_with_out_variant, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_context_manager, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_decorator, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_existing_autograd_session, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_handle_direct_view_on_rebase, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_handle_indirect_view_on_rebase, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_inf_mode_functional_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_inf_mode_inplace_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_inf_mode_view_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_normal_mode_functional_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_normal_mode_inplace_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_normal_mode_view_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_tensor_creation, test/test_autograd.py::TestAutogradInferenceMode::test_mix_inference_and_normal_tensor_functional_op, test/test_autograd.py::TestAutogradInferenceMode::test_mix_inference_and_normal_tensor_inplace_op, test/test_autograd.py::TestAutogradInferenceMode::test_mix_inference_and_normal_tensor_view_op, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_inplace_output_in_inference_mode, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_inplace_output_in_normal_mode, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_view_output_in_inference_mode, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_view_output_in_normal_mode, test/test_autograd.py::TestAutogradStreamSynchronization::test_consumer_to_multi_producer_case_4_correctness, test/test_autograd.py::TestAutogradStreamSynchronization::test_consumer_to_single_producer_case_2_correctness, test/test_autograd.py::TestAutogradStreamSynchronization::test_consumer_to_single_producer_case_3_correctness, test/test_autograd.py::TestAutogradStreamSynchronization::test_consumer_to_single_producer_case_3_correctness_non_default_ambient_stream, test/test_autograd.py::TestAutogradStreamSynchronization::test_consumer_to_single_producer_case_4_correctness, test/test_autograd.py::TestAutogradStreamSynchronization::test_side_stream_backward_overlap, test/test_autograd.py::TestMultithreadAutograd::test_cat_stack_r_to_c, test/test_autograd.py::TestMultithreadAutograd::test_custom_function_propagates_errors_from_device_thread, test/test_autograd.py::TestMultithreadAutograd::test_dataparallel_saved_tensors_hooks, test/test_autograd.py::TestMultithreadAutograd::test_fork_join_in_middle, test/test_autograd.py::TestMultithreadAutograd::test_multi_grad_all_hooks, test/test_autograd.py::TestMultithreadAutograd::test_multi_grad_any_hooks, test/test_autograd.py::TestMultithreadAutograd::test_multithreaded_exception_propagation, test/test_autograd.py::TestMultithreadAutograd::test_preserve_backtrace, test/test_autograd.py::TestMultithreadAutograd::test_python_thread_in_middle, test/test_autograd.py::TestMultithreadAutograd::test_set_multithreading_enabled_as_context_manager_and_function, test/test_autograd.py::TestMultithreadAutograd::test_simple_backward, test/test_autograd.py::TestMultithreadAutograd::test_simple_backward_same_input, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_kwargs_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_kwargs_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_reentrant_backwards_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_reentrant_backwards_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_same_graph_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_same_graph_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_set_early_stop, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_set_early_stop_no_recompution_needed, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_two_children_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_two_children_early_stop_True, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_bad_inputs, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_can_only_trigger_recompute_once, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_flops_and_mem, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_function_with_more_than_one_output, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_function_with_non_tensor_output, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_output_already_has_autograd_meta, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_policy_with_state, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_storage_lifetime, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_subclass_dispatching_sizes, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_version_counter, test/test_autograd.py::TestAutogradComplex::test_view_func_for_complex_views, test/test_autograd.py::TestAutogradComplex::test_view_with_multi_output, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_cuda_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_cuda_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_vectorize_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_vectorize_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_match_vhp_hvp_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_match_vhp_hvp_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_vectorized_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_vectorized_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_multi_input_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_multi_input_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_simple_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_simple_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_unrelated_outputs_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_unrelated_outputs_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_raises_no_warnings_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_raises_no_warnings_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_vectorize_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_vectorize_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_match_vjp_jvp_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_match_vjp_jvp_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_vectorized_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_vectorized_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_vectorized_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_vectorized_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_devices_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_devices_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_dtype_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_dtype_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_multi_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_multi_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_simple_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_simple_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_unrelated_outputs_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_unrelated_outputs_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_zero_dim_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_zero_dim_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_raises_no_warnings_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_raises_no_warnings_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_scalar_logging_tensor, test/test_autograd.py::TestAutogradLogging::test_logging, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_advanced_indexing_backwards_large_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_advanced_indexing_backwards_memory_format_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_backward_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_complex_scalar_backward_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy__cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy_forward_ad_broadcasting_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy_forward_ad_same_layout_copies_grad_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy_r_to_c_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_cross_device_reentrant_autograd_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_free_unneeded_tensor_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_grad_assignment_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_gradcheck_input_output_different_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_multiple_output_view_of_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_backprop_base_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_backprop_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_backprop_view_of_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_gradcheck_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_makes_base_require_grad_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_modify_base_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_multi_output_safe_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_multi_output_unsafe_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_multiple_outputs_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_non_contig_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_of_multiple_output_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_of_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_python_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_then_no_grad_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_undefined_grad_output_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inputbuffer_add_multidevice_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_min_max_median_backprops_to_all_values_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_mv_grad_stride_0_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_non_differentiable_ops_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_parameter_resize_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_pin_memory_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_pow_real_negative_base_complex_exponent_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_profiler_emit_itt_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_profiler_emit_nvtx_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_pyscalar_conversions_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_reentrant_parent_error_on_cpu_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_requires_grad_factory_cuda_float32, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_requires_grad_factory_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_resize_version_bump_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_rnn_backward_to_input_but_not_parameters_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_scatter_index_reduce_amin_amax_backprops_to_all_values_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_scatter_index_reduce_prod_gradgrad_error_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_float16, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_float32, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int16, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int32, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int8, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_simple_reentrant_cross_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_backward_cuda_complex128, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_backward_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_ctor_getter_backward_cuda_complex128, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_ctor_getter_backward_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_mask_autograd_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_strided_leaf_grad_layout_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_to_r_to_c_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_unused_output_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_warning_in_backward_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_where_functional_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_where_scalar_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_zero_dim_param_mixed_device_grad_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_autograd_composite_implicit_and_dispatch_registration_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_autograd_multiple_dispatch_registrations_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_backward_single_threaded_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_backward_tls_stash_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_foward_mode_AD_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_is_retain_graph_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_per_dispatch_key_input_saving_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_set_sequence_nr_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_view_copy_cuda 2025-09-07T07:47:16.8041484Z 2025-09-07T07:47:16.8041687Z Running test_sparse_semi_structured 1/1 ... [2025-09-07 07:47:16.771104] 2025-09-07T07:47:16.8042070Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:16.8042990Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse_semi_structured.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:16.771463] 2025-09-07T07:47:18.7001149Z 2025-09-07T07:47:18.7001909Z test_public_bindings 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_public_bindings_1.1_4996e8e909ac026d_.log 2025-09-07T07:47:18.7003994Z Running 4 items in this shard: test/test_public_bindings.py::TestPublicBindings::test_correct_module_names, test/test_public_bindings.py::TestPublicBindings::test_modules_can_be_imported, test/test_public_bindings.py::TestPublicBindings::test_no_new_bindings, test/test_public_bindings.py::TestPublicBindings::test_no_new_reexport_callables 2025-09-07T07:47:18.7005437Z 2025-09-07T07:47:18.7005721Z Running dynamo/test_input_attr_tracking 1/1 ... [2025-09-07 07:47:18.700195] 2025-09-07T07:47:18.7006205Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:18.7007896Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_input_attr_tracking.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:18.700599] 2025-09-07T07:47:20.8054870Z 2025-09-07T07:47:20.8055928Z dynamo/test_exc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_exc_1.1_75c0cf1ec0969c31_.log 2025-09-07T07:47:20.8059261Z Running 10 items in this shard: test/dynamo/test_exc.py::ExcTests::test_backend_suppress_line, test/dynamo/test_exc.py::ExcTests::test_graph_break_log, test/dynamo/test_exc.py::ExcTests::test_graph_break_log_generic_jump, test/dynamo/test_exc.py::ExcTests::test_internal_error_no_suppress, test/dynamo/test_exc.py::ExcTests::test_internal_error_suppress_errors, test/dynamo/test_exc.py::ExcTests::test_not_implemented_error, test/dynamo/test_exc.py::ExcTests::test_trigger_bisect_on_error, test/dynamo/test_exc.py::ExcTests::test_trigger_on_error, test/dynamo/test_exc.py::ExcTests::test_unsupported_error, test/dynamo/test_exc.py::ExcTests::test_unsupported_real_stack 2025-09-07T07:47:20.8061595Z 2025-09-07T07:47:20.8061814Z Running functorch/test_control_flow 1/1 ... [2025-09-07 07:47:20.805516] 2025-09-07T07:47:20.8062210Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:20.8063168Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_control_flow.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:20.805857] 2025-09-07T07:47:22.8209526Z 2025-09-07T07:47:22.8210538Z dynamo/test_input_attr_tracking 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_input_attr_tracking_1.1_a237d3a67ca190d2_.log 2025-09-07T07:47:22.8217192Z Running 12 items in this shard: test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_complex_attr_access_with_graph_breaks, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_complex_attr_access_with_inline_reconstruct, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_complex_attr_access_without_graph_breaks, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_const_property_assigned_on_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_const_property_on_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_guards_correctly_property_assigned_on_tensor_type_change, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_guards_correctly_property_assigned_on_tensor_type_change_inductor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_set_data_on_input_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_set_data_on_scoped_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_set_data_on_user_defined_class_input_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_tensor_property_assigned_on_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_tensor_property_on_tensor 2025-09-07T07:47:22.8222319Z 2025-09-07T07:47:22.8222515Z Running test_matmul_cuda 1/1 ... [2025-09-07 07:47:22.820957] 2025-09-07T07:47:22.8222895Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:22.8223848Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_matmul_cuda.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:22.821327] 2025-09-07T07:47:24.1965779Z 2025-09-07T07:47:24.1966622Z test_sparse_semi_structured 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_semi_structured_1.1_a30c13bc7e9178f6_.log 2025-09-07T07:47:24.1984016Z Running 42 items in this shard: test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_mlp_contiguous_relu_compile_cusparselt, test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_mlp_contiguous_relu_compile_cutlass, test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_sp24_compile, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_indices, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_linear, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_min_sparse_shape, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mlp, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_NN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_NT, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_TN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_second_NN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_second_NT, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_to_sparse_semi_structured, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_dim, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_shape, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_values, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_gemm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_edge_case1, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_id, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_meta_correctness, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_prune_dense_static_sort, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pruning_algo_largest_abs_values_greedy, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_apply, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_apply_dense, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls_bmm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls_mat_vec, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_conversions, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_conversions_all_patterns, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_linear_cutlass, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_sparse_semi_structured_ops_cutlass, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha_compile_autotune, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha_mixed_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_mixed_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_search, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_csrc_cslt_sparse_mm_search, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cusparselt_backend, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_fp8fp8_mm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_semi_structured_scaled_mm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_semi_structured_scaled_mm_fp8 2025-09-07T07:47:24.1997967Z 2025-09-07T07:47:24.1998135Z Running test_dataloader 1/2 ... [2025-09-07 07:47:24.196719] 2025-09-07T07:47:24.1998487Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:24.1999376Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dataloader.py', '-m', 'not serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:24.197084] 2025-09-07T07:47:27.3695914Z 2025-09-07T07:47:27.3696874Z test_decomp 19/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_19.22_2dbb057c93f57819_.log 2025-09-07T07:47:27.3806482Z Running 427 items in this shard: test/test_decomp.py::TestDecompCUDA::test_batch_norm_unflatten_weight_bias_cuda, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rand___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___ror___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_left_shift_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_complex_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float8_e5m2, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_triangular_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vector_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logaddexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_celu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_linear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mish_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_unfold_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ormqr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_searchsorted_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_kaiser_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_mm_reduce_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtri_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_uniform_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__upsample_bilinear2d_aa_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_bernoulli_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_logaddexp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_glu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_rot90_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_special_xlog1py_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_zero__cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_elu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_gelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_leaky_relu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_logsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_inf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_round_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_vdot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_LSTM_train_mode_cuda_float64, test/test_decomp.py::DecompOneOffTestsCUDA::test_sdpa_nn_functional_scaled_dot_product_attention_cuda_bfloat16 2025-09-07T07:47:27.3912241Z 2025-09-07T07:47:27.3912411Z Running test_dataloader 2/2 ... [2025-09-07 07:47:27.370171] 2025-09-07T07:47:27.3912768Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:27.3913797Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dataloader.py', '-m', 'not serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:27.370489] 2025-09-07T07:47:27.9301665Z 2025-09-07T07:47:27.9303020Z functorch/test_control_flow 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_control_flow_1.1_cb5cd96a7d6352e6_.log 2025-09-07T07:47:27.9951538Z Running 1339 items in this shard: test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_complex, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_different_pytree_output, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_gpu, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_grad_through_cond, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_inner_fn, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_inner_tensor, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_mixed_require_grad, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_nested, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_pytree_input, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_pytree_not_all_inputs_used, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_same_pytree_output, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_simple, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_torch_nn_module, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_user_nn_module, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_zeros_unused_branch_complex_compile_fail_compile_mode_compile_dynamic_shape_scalar_False, test/functorch/test_control_flow.py::TestControlFlow::test_cond_gpu, test/functorch/test_control_flow.py::TestControlFlow::test_cond_in_forloop, test/functorch/test_control_flow.py::TestControlFlow::test_cond_no_trace, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_higher_order, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_nested_list, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_no_grad_output, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_simple, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_simple_partial_grad, test/functorch/test_control_flow.py::TestControlFlow::test_map_dict_in_out, test/functorch/test_control_flow.py::TestControlFlow::test_map_gpu, test/functorch/test_control_flow.py::TestControlFlow::test_map_illegal_inputs, test/functorch/test_control_flow.py::TestControlFlow::test_map_illegal_outputs, test/functorch/test_control_flow.py::TestControlFlow::test_map_list_in_out, test/functorch/test_control_flow.py::TestControlFlow::test_scan_associative_scan, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_carry_carry_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_carry_output_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_compile_mode_eager_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_compile_mode_eager_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_compile_mode_none_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_compile_mode_none_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_additional_inputs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_additional_inputs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_complex_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_init_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_init_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_random_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_random_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_xs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_xs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_additional_inputs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_additional_inputs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_complex_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_init_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_init_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_random_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_random_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_xs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_xs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_additional_inputs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_additional_inputs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_complex_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_init_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_init_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_random_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_random_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_xs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_xs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_additional_inputs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_additional_inputs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_complex_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_init_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_init_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_random_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_random_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_xs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_xs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_False_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_False_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_True_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_True_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cpu_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cpu_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cpu_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cuda_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cuda_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cuda_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cuda_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_float_output, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_non_tensor, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_scanned_0, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_carry_shape, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_complex_reverse_False_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_complex_reverse_False_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_complex_reverse_True_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_complex_reverse_True_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_init_longer_carry, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_init_shorter_carry, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_shape, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_carry_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_mutation, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_output_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_1_device_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_1_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_2_device_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_2_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_3_device_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_3_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_one_return, test/functorch/test_control_flow.py::TestControlFlow::test_scan_pytree_output, test/functorch/test_control_flow.py::TestControlFlow::test_scan_simple_graph, test/functorch/test_control_flow.py::TestControlFlow::test_scan_simple_graph_wrong_dtype, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_wrong_pytree, test/functorch/test_control_flow.py::TestControlFlow::test_scan_y_less_ndim_then_dim, test/functorch/test_control_flow.py::TestControlFlow::test_while_loop_gpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_combine_fn_wrong_meta_in_combine_fn, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_wrong_dim, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_none_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_none_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_none_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_none_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_none_combine_mode_generic_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_none_combine_mode_generic_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_none_combine_mode_pointwise_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_none_combine_mode_pointwise_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_shape_failure, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_True_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_True_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_True_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_True_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_True_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_True_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_True_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_True_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_False_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_False_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_False_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_False_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_True_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_True_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_True_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_True_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_True_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_True_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_True_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_True_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_True_same_direction_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_True_same_direction_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_True_same_direction_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_True_same_direction_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cpu_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cpu_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cuda_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cuda_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cpu_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cpu_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cuda_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cuda_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cpu_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cpu_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cuda_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cuda_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cpu_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cpu_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cuda_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cuda_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_True_cpu_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_True_cpu_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_True_cuda_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_True_cuda_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cpu_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cpu_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cuda_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cuda_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_generic, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_input_mutation, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_input_output_alias, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_failure, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_map_in_combine_fn, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_nested, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_dynamic_shape_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_dynamic_shape_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_eager_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_eager_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_none_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_none_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_compile_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_compile_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_compile_dynamic_shape_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_compile_dynamic_shape_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_eager_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_eager_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_none_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_none_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_output_output_alias, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_pytree_output, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_sparse_tensor, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_wrong_pytree, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_accepts_torch_function_as_inputs, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_autograd_backward, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_eager_run_with_item, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_aot_func_check_functional, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_data_dependent_pred, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_aliasing_with_aot_func, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_mutation_on_false_branch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_mutation_on_true_branch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_nested_input_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_nested_input_mutation_with_aot_func, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_output_alias_input, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_make_fx_preserve_stack_trace_for_nodes_in_subgraph, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_merge_graph_preserves_ph_meta, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_output_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_output_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_output_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_output_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_strided_output_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_strided_output_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_strided_output_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_strided_output_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_multi, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_multi_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_other_inputs, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_other_inputs_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_with_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_with_closure_graph_module, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_no_dynamo_cache_limit, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_retrace_functionalized, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_simple_with_linear_compile_check_graph, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_subgraph_same_shape_env_as_parent, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_symint_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_symint_operands_requires_grad_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_symint_operands_requires_grad_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_trace_set__and_mutate_input, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_trace_set__and_mutate_intermediate, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_traced_not_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_traced_not_nested_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_unbacked_symint_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_args_with_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_inputs, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_outputs_nClosure_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_outputs_nClosure_1, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_function_nOperands_2_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_function_nOperands_2_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_module_nOperands_2_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_module_nOperands_2_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_2_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_2_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_single_input_with_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_consecutive_make_fx_symbolic, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_module_param_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_module_python_scalar_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_sym_pred, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_tensor_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_tensor_closure_graph_module, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_unbacked_sym_pred, test/functorch/test_control_flow.py::TestControlFlowTraced::test_hop_raises_if_not_overriding_call, test/functorch/test_control_flow.py::TestControlFlowTraced::test_input_input_alias, test/functorch/test_control_flow.py::TestControlFlowTraced::test_input_mutation_inference_mode_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_input_mutation_inference_mode_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_input_output_alias, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_aot_func, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_arg_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_elem_alias, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_elem_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_unfunc_boolean_tensor_for_nested_map_cond, test/functorch/test_control_flow.py::TestControlFlowTraced::test_merge_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_nested_cond_map_cond_symbolic, test/functorch/test_control_flow.py::TestControlFlowTraced::test_nested_map_cond_real, test/functorch/test_control_flow.py::TestControlFlowTraced::test_nested_map_cond_symbolic, test/functorch/test_control_flow.py::TestControlFlowTraced::test_raise_error_on_mismatch_tensor_size, test/functorch/test_control_flow.py::TestControlFlowTraced::test_raise_error_on_mismatch_tensor_size_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_raise_error_on_mismatch_type_size, test/functorch/test_control_flow.py::TestControlFlowTraced::test_raise_error_on_mismatch_type_size_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_functionalized, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_functionalized_elem_alias, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_functionalized_elem_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_pytree_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_autograd_aot_functionalized, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_autograd_symbolic_dict, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_autograd_symbolic_list, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_autograd_symbolic_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_real, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_symbolic_dict, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_symbolic_list, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_symbolic_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_two_hops_not_sharing_code_obj, test/functorch/test_control_flow.py::TestControlFlowTraced::test_vmap_vmap_boolcond_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_vmap_vmap_boolcond_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_autograd_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_const_and_symint_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_nested_with_linear, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_pytree_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_simple_with_linear, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_const_and_symint_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_nested_with_linear, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_pytree_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_simple_with_linear, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_no_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_no_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_no_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_no_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_no_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_nested2_traced, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_nested_traced, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_compile_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_compile_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_compile_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_compile_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_False_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_False_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_True_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_True_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_compile_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_compile_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_compile_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_compile_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_export_strict_False_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_export_strict_False_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_export_strict_True_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_export_strict_True_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_mismatch_in_meta, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_False_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_False_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_True_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_True_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_cpp, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_functorch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_no, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_python, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_with_linear_compile_check_graph, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_nested_with_linear, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_simple_with_linear, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_unbacked_bindings, test/functorch/test_control_flow.py::TestHopSchema::test_associative_scan_gen_schema_multiple_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_associative_scan_gen_schema_tensor_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_associative_scan_gen_schema_with_additional_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_cond_gen_schema_symbool_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_cond_gen_schema_tensor_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_function_schema_gen, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_GraphModule, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_ScriptObj, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_SymBool, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_SymInt, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_Tensor, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_bool, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_float, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_int, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_str, test/functorch/test_control_flow.py::TestHopSchema::test_scan_gen_schema_multiple_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_scan_gen_schema_tensor_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_scan_gen_schema_with_additional_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_schema_tree_spec, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_GraphModule, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_ScriptObj, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_SymBool, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_SymInt, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_Tensor, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_bool, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_float, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_int, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_str, test/functorch/test_control_flow.py::TestHopSchema::test_while_loop_gen_schema_tensor_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_while_loop_gen_schema_with_additional_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_while_loop_gen_schema_with_input_mutation, test/functorch/test_control_flow.py::TestHopSchema::test_while_loop_gen_schema_with_int_carries 2025-09-07T07:47:28.0566934Z 2025-09-07T07:47:28.0567115Z Running optim/test_swa_utils 1/1 ... [2025-09-07 07:47:27.933350] 2025-09-07T07:47:28.0567476Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:28.0568383Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'optim/test_swa_utils.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:27.933694] 2025-09-07T07:47:28.8944527Z 2025-09-07T07:47:28.8945504Z test_matmul_cuda 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_matmul_cuda_1.1_796d917271c554f7_.log 2025-09-07T07:47:28.9472071Z Running 1183 items in this shard: test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_32_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_32_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_32_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_32_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_32_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_32_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_32_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_32_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_32_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_32_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_32_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_32_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_32_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_32_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_32_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_32_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_32_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_32_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_alignment_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_no_reduced_precision_small_size_4_size_32768_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_no_reduced_precision_small_size_4_size_32768_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_no_reduced_precision_small_size_8_size_32768_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_no_reduced_precision_small_size_8_size_32768_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_10000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_10000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_10000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_10000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_1000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_1000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_1000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_1000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_100_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_100_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_100_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_100_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_10000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_10000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_10000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_10000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_1000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_1000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_1000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_1000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_100_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_100_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_100_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_100_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublas_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublaslt_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublas_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublaslt_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublas_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublaslt_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_and_lt_reduced_precision_fp16_accumulate_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_100_100_100_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_100_100_100_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_100_100_100_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_fp16_accum_and_fp32_out_failure_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_fp16_accum_and_fp32_out_failure_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_fp16_accum_and_fp32_out_failure_batch_size_32_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_fp16_accum_and_fp32_out_failure_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_False_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_False_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_False_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_False_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_True_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_True_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_True_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_True_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_False_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_False_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_False_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_False_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_True_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_True_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_True_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_True_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_False_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_False_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_False_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_False_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_True_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_True_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_True_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_True_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_False_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_False_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_False_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_False_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_True_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_True_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_True_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_True_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMixedDtypesLinearCudaCUDA::test_mixed_dtypes_linear_cuda_bfloat16, test/test_matmul_cuda.py::TestMixedDtypesLinearCudaCUDA::test_mixed_dtypes_linear_cuda_float16, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_compile_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_error_messages_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_error_messages_recipe_nvfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_compile_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_error_message_fp8_pre_sm89_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_float32_output_errors_with_bias_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_basics_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_bias_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_bias_relu_edgecase_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_error_messages_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_rowwise_scaling_sanity_use_fast_accum_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_rowwise_scaling_sanity_use_fast_accum_True_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_scale_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_scale_fast_accum_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_honor_sm_carveout_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_2d_G_16_M_2048_N_8192_K_16640_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_2d_G_16_M_2049_N_8192_K_16640_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_2d_G_1_M_2048_N_8192_K_16640_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_2d_G_1_M_2049_N_8192_K_16640_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_2d_G_4_M_2048_N_8192_K_16640_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_2d_G_4_M_2049_N_8192_K_16640_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_16_M_16640_N_8192_K_4096_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_1_M_16640_N_8192_K_4096_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_4_M_16640_N_8192_K_4096_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_non_divisible_leading_dim_bias_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_non_divisible_leading_dim_bias_True_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_pack_uint4_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_False_strided_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_True_strided_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_False_strided_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_True_strided_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_False_strided_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_True_strided_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_False_strided_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_True_strided_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_bfloat16_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_float16_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_float32_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_bfloat16_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_bfloat16_lhs_block_128_rhs_block_1_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_bfloat16_lhs_block_1_rhs_block_128_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_bfloat16_lhs_block_1_rhs_block_1_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_float32_lhs_block_128_rhs_block_1_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_float32_lhs_block_1_rhs_block_128_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_float32_lhs_block_1_rhs_block_1_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_float16_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_float32_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_row_wise_bfloat16_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_row_wise_float32_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_0_use_torch_compile_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_0_use_torch_compile_True_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_1_use_torch_compile_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_1_use_torch_compile_True_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_2_use_torch_compile_False_cuda, test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_2_use_torch_compile_True_cuda 2025-09-07T07:47:28.9977280Z 2025-09-07T07:47:28.9977486Z Running test_xnnpack_integration 2/4 ... [2025-09-07 07:47:28.896622] 2025-09-07T07:47:28.9977872Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:28.9978797Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_xnnpack_integration.py', '-m', 'not serial', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:28.896999] 2025-09-07T07:47:28.9979594Z 2025-09-07T07:47:28.9979999Z test_dataloader 1/2 was successful, full logs can be found in artifacts with path test/test-reports/test_dataloader_1.2_46766783c259ae9a_.log 2025-09-07T07:47:29.0005060Z Running 96 items in this shard: test/test_dataloader.py::TestDatasetRandomSplit::test_incomplete_fractional_splits, test/test_dataloader.py::TestDatasetRandomSplit::test_slicing_of_subset_of_subset, test/test_dataloader.py::TestDatasetRandomSplit::test_splits_are_mutually_exclusive, test/test_dataloader.py::TestDatasetRandomSplit::test_splits_indexing_type, test/test_dataloader.py::TestTensorDataset::test_getitem, test/test_dataloader.py::TestTensorDataset::test_len, test/test_dataloader.py::TestTensorDataset::test_many_tensors, test/test_dataloader.py::TestStackDataset::test_getitem, test/test_dataloader.py::TestStackDataset::test_getitems, test/test_dataloader.py::TestStackDataset::test_getitems_raises_index_error, test/test_dataloader.py::TestStackDataset::test_getitems_value_error, test/test_dataloader.py::TestStackDataset::test_mixed, test/test_dataloader.py::TestStackDataset::test_single, test/test_dataloader.py::TestStackDataset::test_size_mismatch, test/test_dataloader.py::TestConcatDataset::test_concat_two_non_singletons, test/test_dataloader.py::TestConcatDataset::test_concat_two_non_singletons_with_empty, test/test_dataloader.py::TestConcatDataset::test_iterable_dataset_err, test/test_dataloader.py::TestDataLoader::test_builtin_collection_conversion, test/test_dataloader.py::TestDataLoader::test_default_collate_bad_numpy_types, test/test_dataloader.py::TestDataLoader::test_default_collate_mapping_keep_type, test/test_dataloader.py::TestDataLoader::test_default_collate_sequence_dont_keep_type, test/test_dataloader.py::TestDataLoader::test_default_collate_sequence_keep_type, test/test_dataloader.py::TestDataLoader::test_default_collate_shared_tensor, test/test_dataloader.py::TestDataLoader::test_default_convert_sequence_dont_keep_type, test/test_dataloader.py::TestDataLoader::test_duplicating_data_with_drop_last, test/test_dataloader.py::TestDataLoader::test_error, test/test_dataloader.py::TestDataLoader::test_error_in_init, test/test_dataloader.py::TestDataLoader::test_excessive_thread_creation_warning, test/test_dataloader.py::TestDataLoader::test_fd_limit_exceeded, test/test_dataloader.py::TestDataLoader::test_get_worker_info, test/test_dataloader.py::TestDataLoader::test_invalid_assign_after_init, test/test_dataloader.py::TestDataLoader::test_iterable_style_dataset, test/test_dataloader.py::TestDataLoader::test_iterabledataset_len, test/test_dataloader.py::TestDataLoader::test_len, test/test_dataloader.py::TestDataLoader::test_multi_epochs_reproducibility, test/test_dataloader.py::TestDataLoader::test_multiprocessing_contexts, test/test_dataloader.py::TestDataLoader::test_multiprocessing_iterdatapipe_with_dill, test/test_dataloader.py::TestDataLoader::test_no_segfault, test/test_dataloader.py::TestDataLoader::test_numpy, test/test_dataloader.py::TestDataLoader::test_partial_workers, test/test_dataloader.py::TestDataLoader::test_random_sampler, test/test_dataloader.py::TestDataLoader::test_random_sampler_len_with_replacement, test/test_dataloader.py::TestDataLoader::test_random_sampler_len_without_replacement, test/test_dataloader.py::TestDataLoader::test_sampler, test/test_dataloader.py::TestDataLoader::test_sampler_reproducibility, test/test_dataloader.py::TestDataLoader::test_seqential_batch_workers, test/test_dataloader.py::TestDataLoader::test_seqential_batch_workers_prefetch, test/test_dataloader.py::TestDataLoader::test_shuffle, test/test_dataloader.py::TestDataLoader::test_shuffle_batch, test/test_dataloader.py::TestDataLoader::test_worker_init_fn, test/test_dataloader.py::TestDictDataLoader::test_pin_memory_no_cuda, test/test_dataloader.py::TestDictDataLoader::test_pin_memory_with_only_device, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_batch_sampler, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_dataset_not_reset, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_default_collate_bad_numpy_types, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_default_collate_dtype, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_default_collate_mapping_keep_type, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_default_collate_numpy_memmap, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_default_collate_sequence_dont_keep_type, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_default_collate_sequence_keep_type, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_default_convert_sequence_dont_keep_type, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_default_convert_sequence_keep_type, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_duplicating_data_with_drop_last, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_error, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_get_worker_info, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_invalid_ctor_args_combinations, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_iterable_style_dataset, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_iterabledataset_len, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_large_sampler_indices, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_multiprocessing_iterdatapipe, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_numpy_scalars, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_partial_workers, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_random_sampler, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_segfault, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_seqential_batch_workers, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_seqential_batch_workers_prefetch, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_sequential_workers, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_shuffle, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_shuffle_batch_none, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_shuffle_batch_workers, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_shuffle_batch_workers_prefetch, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_shuffle_pin_memory, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_shuffle_workers, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_timeout, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_worker_init_fn, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_worker_seed_reproducibility, test/test_dataloader.py::TestNamedTupleDataLoader::test_dataloader_with_namedtuple, test/test_dataloader.py::TestCustomPinFn::test_custom_batch_pin, test/test_dataloader.py::TestConvAfterFork::test_conv_after_fork, test/test_dataloader.py::TestOutOfOrderDataLoader::test_in_order_index_ds, test/test_dataloader.py::TestOutOfOrderDataLoader::test_out_of_order_index_ds, test/test_dataloader.py::TestOutOfOrderDataLoader::test_out_of_order_iterable_ds, test/test_dataloader.py::TestDataLoaderDeviceTypeCUDA::test_nested_tensor_multiprocessing_context_fork_cuda, test/test_dataloader.py::TestDataLoaderDeviceTypeCUDA::test_nested_tensor_multiprocessing_context_spawn_cuda, test/test_dataloader.py::TestDataLoaderDeviceTypeCUDA::test_sparse_tensor_multiprocessing_context_fork_cuda, test/test_dataloader.py::TestDataLoaderDeviceTypeCUDA::test_sparse_tensor_multiprocessing_context_spawn_cuda 2025-09-07T07:47:29.0029856Z 2025-09-07T07:47:29.0030040Z Running test_xnnpack_integration 4/4 ... [2025-09-07 07:47:28.968459] 2025-09-07T07:47:29.0030403Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:29.0031306Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_xnnpack_integration.py', '-m', 'not serial', '--shard-id=4', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:28.968825] 2025-09-07T07:47:31.4534713Z 2025-09-07T07:47:31.4535829Z optim/test_swa_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/optim.test_swa_utils_1.1_65e1a0be82787b1c_.log 2025-09-07T07:47:31.4536793Z 2025-09-07T07:47:31.4538934Z Running test_mkldnn 1/1 ... [2025-09-07 07:47:31.453708] 2025-09-07T07:47:31.4539305Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:31.4543466Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkldnn.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:31.454148] 2025-09-07T07:47:32.7385782Z 2025-09-07T07:47:32.7386931Z test_xnnpack_integration 4/4 was successful, full logs can be found in artifacts with path test/test-reports/test_xnnpack_integration_4.4_91ba8dab8799a013_.log 2025-09-07T07:47:32.7388525Z Running 2 items in this shard: test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_linear 2025-09-07T07:47:32.7389319Z 2025-09-07T07:47:32.7389527Z Running test_linalg 2/3 ... [2025-09-07 07:47:32.738564] 2025-09-07T07:47:32.7389981Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:32.7391184Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_linalg.py', '-m', 'not serial', '--shard-id=2', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:32.738860] 2025-09-07T07:47:33.3180307Z 2025-09-07T07:47:33.3181262Z test_xnnpack_integration 2/4 was successful, full logs can be found in artifacts with path test/test-reports/test_xnnpack_integration_2.4_2c8c41cbe740a5f7_.log 2025-09-07T07:47:33.3183820Z Running 3 items in this shard: test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d_transpose 2025-09-07T07:47:33.3185256Z 2025-09-07T07:47:33.3185520Z Running test_mkldnn_fusion 1/1 ... [2025-09-07 07:47:33.318186] 2025-09-07T07:47:33.3186077Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:33.3188126Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkldnn_fusion.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:33.318591] 2025-09-07T07:47:36.0752399Z 2025-09-07T07:47:36.0753324Z test_mkldnn 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkldnn_1.1_26122fcd01518475_.log 2025-09-07T07:47:36.0754107Z Running 0 items in this shard: 2025-09-07T07:47:36.0754331Z 2025-09-07T07:47:36.0755184Z Running test_sparse_csr 1/1 ... [2025-09-07 07:47:36.075315] 2025-09-07T07:47:36.0756163Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:47:36.0759341Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse_csr.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:47:36.075743] 2025-09-07T07:48:43.0499167Z 2025-09-07T07:48:43.0501468Z test_mkldnn_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkldnn_fusion_1.1_0b6ade63f6040fe0_.log 2025-09-07T07:48:43.0506610Z Running 8 items in this shard: test/test_mkldnn_fusion.py::TestMkldnnFusion::test_conv_binary_fusion_ops, test/test_mkldnn_fusion.py::TestMkldnnFusion::test_conv_transpose_unary_fusion_ops, test/test_mkldnn_fusion.py::TestMkldnnFusion::test_conv_unary_fusion_nnc, test/test_mkldnn_fusion.py::TestMkldnnFusion::test_conv_unary_fusion_ops, test/test_mkldnn_fusion.py::TestMkldnnFusion::test_linear_binary_fusion_ops, test/test_mkldnn_fusion.py::TestMkldnnFusion::test_linear_unary_fusion_ops, test/test_mkldnn_fusion.py::TestMkldnnFusion::test_single_conv, test/test_mkldnn_fusion.py::TestMkldnnFusion::test_unsupported_conv 2025-09-07T07:48:43.0510149Z 2025-09-07T07:48:43.0511307Z Running test_type_promotion 1/1 ... [2025-09-07 07:48:43.049916] 2025-09-07T07:48:43.0511974Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:48:43.0514526Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_promotion.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:48:43.050320] 2025-09-07T07:48:47.9882131Z 2025-09-07T07:48:47.9883964Z test_type_promotion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_promotion_1.1_59965d411eb0ff46_.log 2025-09-07T07:48:48.0024672Z Running 423 items in this shard: test/test_type_promotion.py::TestTypePromotionCUDA::test_add_wrapped_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_alpha_mismatch_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_alternate_result_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_bfloat16_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_booleans_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_can_cast_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_cat_different_dtypes_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_cat_out_different_dtypes_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_comparison_ops_with_type_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_assertraises_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_half_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_scalar_mult_tensor_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_computation_ignores_out_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_create_bool_tensors_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_float_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_from_issue_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_half_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_indexing_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_indexing_fail_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_inplace_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_int_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_int_to_float_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_lt_with_type_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_many_promotions_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_mixed_type_backward_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_non_promoting_ops_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_promote_self_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_promote_types_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_tensor_vs_scalar_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_add_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_mul_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_sub_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_ternary_out_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_transpose_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unsigned_cuda 2025-09-07T07:48:48.0155795Z 2025-09-07T07:48:48.0156023Z Running torch_np/test_reductions 1/1 ... [2025-09-07 07:48:47.989107] 2025-09-07T07:48:48.0156411Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:48:48.0157349Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_reductions.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:48:47.989546] 2025-09-07T07:48:53.6802920Z 2025-09-07T07:48:53.6822419Z torch_np/test_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_reductions_1.1_d6345091c8712b79_.log 2025-09-07T07:48:53.7141011Z Running 966 items in this shard: test/torch_np/test_reductions.py::TestFlatnonzero::test_basic, test/torch_np/test_reductions.py::TestAny::test_basic, test/torch_np/test_reductions.py::TestAny::test_method_vs_function, test/torch_np/test_reductions.py::TestAny::test_nd, test/torch_np/test_reductions.py::TestAll::test_basic, test/torch_np/test_reductions.py::TestAll::test_method_vs_function, test/torch_np/test_reductions.py::TestAll::test_nd, test/torch_np/test_reductions.py::TestMean::test_mean, test/torch_np/test_reductions.py::TestMean::test_mean_float16, test/torch_np/test_reductions.py::TestMean::test_mean_values, test/torch_np/test_reductions.py::TestMean::test_mean_where, test/torch_np/test_reductions.py::TestSum::test_sum, test/torch_np/test_reductions.py::TestSum::test_sum_boolean, test/torch_np/test_reductions.py::TestSum::test_sum_complex_1_dt0, test/torch_np/test_reductions.py::TestSum::test_sum_complex_1_dt1, test/torch_np/test_reductions.py::TestSum::test_sum_complex_2_dt0, test/torch_np/test_reductions.py::TestSum::test_sum_complex_2_dt1, test/torch_np/test_reductions.py::TestSum::test_sum_dtypes_2, test/torch_np/test_reductions.py::TestSum::test_sum_dtypes_warnings, test/torch_np/test_reductions.py::TestSum::test_sum_initial, test/torch_np/test_reductions.py::TestSum::test_sum_stability, test/torch_np/test_reductions.py::TestSum::test_sum_where, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func9, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_array_axis_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_array_axis_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_bad_tuple_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_bad_tuple_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_empty_generic_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_empty_generic_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_bad_axis_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_bad_axis_func1 2025-09-07T07:48:53.7451605Z 2025-09-07T07:48:53.7451768Z Running test_dlpack 1/1 ... [2025-09-07 07:48:53.682024] 2025-09-07T07:48:53.7452105Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:48:53.7452990Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dlpack.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:48:53.682413] 2025-09-07T07:48:58.0035327Z 2025-09-07T07:48:58.0036931Z test_dlpack 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_dlpack_1.1_113eeee9f8bd0aca_.log 2025-09-07T07:48:58.0084551Z Running 142 items in this shard: test/test_dlpack.py::TestTorchDlPackCUDA::test_automatically_select_in_creation_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_copy_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_convert_default_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_cuda_per_thread_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_default_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_export_is_conj_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_export_non_strided_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_export_requires_grad_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_invalid_cpu_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_invalid_cuda_streams_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_invalid_rocm_streams_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_normalize_strides_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_shared_storage_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_on_different_device_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_unsupported_dtype_error_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_max_version_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_needs_copy_error_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_no_copy_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_unsupported_device_error_cuda 2025-09-07T07:48:58.0123007Z 2025-09-07T07:48:58.0123267Z Running torch_np/numpy_tests/core/test_scalar_ctors 1/1 ... [2025-09-07 07:48:58.003807] 2025-09-07T07:48:58.0123704Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:48:58.0124691Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_scalar_ctors.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:48:58.004201] 2025-09-07T07:49:01.8939397Z 2025-09-07T07:49:01.8941248Z torch_np/numpy_tests/core/test_scalar_ctors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_scalar_ctors_1.1_b0d5d22ca36c84dd_.log 2025-09-07T07:49:01.8965284Z Running 65 items in this shard: test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromString::test_bool, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromString::test_floating, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromString::test_floating_overflow, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromInt::test_intp, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromInt::test_uint64_from_negative, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t10_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t10_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t10_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t11_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t11_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t11_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t23, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t23, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t23 2025-09-07T07:49:01.9054155Z 2025-09-07T07:49:01.9054661Z Running profiler/test_profiler_tree 1/1 ... [2025-09-07 07:49:01.894202] 2025-09-07T07:49:01.9055097Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:49:01.9056070Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_profiler_tree.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:49:01.895208] 2025-09-07T07:49:06.2164146Z 2025-09-07T07:49:06.2165290Z profiler/test_profiler_tree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_profiler_tree_1.1_e270acd44e24e84e_.log 2025-09-07T07:49:06.2172782Z Running 10 items in this shard: test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_detailed, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_with_stream, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory_and_stack, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_record_function, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_modules, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_dispatch, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_function 2025-09-07T07:49:06.2179616Z 2025-09-07T07:49:06.2179981Z Running test_prims 1/1 ... [2025-09-07 07:49:06.216484] 2025-09-07T07:49:06.2180584Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:49:06.2181975Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_prims.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:49:06.216911] 2025-09-07T07:49:11.2976990Z 2025-09-07T07:49:11.2978068Z test_prims 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_prims_1.1_a5db58609fa1e6f2_.log 2025-09-07T07:49:11.2988283Z Running 24 items in this shard: test/test_prims.py::TestPrimsBasic::test_check_deprecation_warning, test/test_prims.py::TestPrimsBasic::test_clone_complex, test/test_prims.py::TestPrimsBasic::test_mul_complex, test/test_prims.py::TestPrimsBasic::test_torch_ops, test/test_prims.py::TestPrimsCUDA::test_aten_overload_to_prims_cuda, test/test_prims.py::TestPrimsCUDA::test_broadcast_in_dim_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_broadcast_in_dim_sum_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_cbrt_prim_cuda_float64, test/test_prims.py::TestPrimsCUDA::test_cbrt_prim_cuda_int64, test/test_prims.py::TestPrimsCUDA::test_collapse_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_functional_rng_wrappers_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_memory_format_strides_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_philox_rand_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_reshape_view_method_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_var_correction_0_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_var_correction_1_cuda_float32, test/test_prims.py::TestRefsCUDA::test_constant_pad_nd_memory_format_cuda_float32, test/test_prims.py::TestRefsCUDA::test_inferred_tags_cuda, test/test_prims.py::TestRefsCUDA::test_infinite_loop_from_py_dispatcher_cuda, test/test_prims.py::TestRefsCUDA::test_linspace_with_complex_input_cuda, test/test_prims.py::TestRefsCUDA::test_logspace_with_complex_input_cuda, test/test_prims.py::TestRefsCUDA::test_unbind_cuda, test/test_prims.py::TestDecompCUDA::test_decomposition_method_vararg_ones_cuda_float32, test/test_prims.py::TestDecompCUDA::test_decomposition_method_vararg_permute_cuda_float32 2025-09-07T07:49:11.2996756Z 2025-09-07T07:49:11.2997154Z Running test_jit_autocast 1/1 ... [2025-09-07 07:49:11.297706] 2025-09-07T07:49:11.2997726Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:49:11.2998855Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_jit_autocast.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:49:11.298121] 2025-09-07T07:49:16.8719298Z 2025-09-07T07:49:16.8720572Z test_jit_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_autocast_1.1_c2d3520da71a05b4_.log 2025-09-07T07:49:16.8770920Z Running 54 items in this shard: test/test_jit_autocast.py::TestAutocast::test_autocast_api, test/test_jit_autocast.py::TestAutocast::test_autocast_api_not_supported, test/test_jit_autocast.py::TestAutocast::test_autocast_autodiff, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator_outside_jit, test/test_jit_autocast.py::TestAutocast::test_autocast_mixed_dtypes, test/test_jit_autocast.py::TestAutocast::test_callees, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_off, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_on, test/test_jit_autocast.py::TestAutocast::test_conditional_autocast, test/test_jit_autocast.py::TestAutocast::test_control_flow, test/test_jit_autocast.py::TestAutocast::test_divergent_autocast, test/test_jit_autocast.py::TestAutocast::test_divergent_types, test/test_jit_autocast.py::TestAutocast::test_duplicate_inputs, test/test_jit_autocast.py::TestAutocast::test_eager_and_script, test/test_jit_autocast.py::TestAutocast::test_explicit_casts, test/test_jit_autocast.py::TestAutocast::test_fp32_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_policy_with_fp64, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_ignore_amp, test/test_jit_autocast.py::TestAutocast::test_implicitly_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_inplace, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_cpu, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_gpu, test/test_jit_autocast.py::TestAutocast::test_jit_call_method_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_executor_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_basic, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_constants, test/test_jit_autocast.py::TestAutocast::test_jit_generic_autocast, test/test_jit_autocast.py::TestAutocast::test_linear_bf16, test/test_jit_autocast.py::TestAutocast::test_minimal, test/test_jit_autocast.py::TestAutocast::test_minimal_cpu, test/test_jit_autocast.py::TestAutocast::test_minimal_off, test/test_jit_autocast.py::TestAutocast::test_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_promote_policy, test/test_jit_autocast.py::TestAutocast::test_promote_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_reused_autocast, test/test_jit_autocast.py::TestAutocast::test_reused_autocast_expr, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state_expr, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing_with_autocast, test/test_jit_autocast.py::TestAutocast::test_script_module, test/test_jit_autocast.py::TestAutocast::test_tracing_and_script, test/test_jit_autocast.py::TestAutocast::test_tracing_with_autocast_and_script, test/test_jit_autocast.py::TestJitTraceAutocast::test_cat_promote, test/test_jit_autocast.py::TestJitTraceAutocast::test_generate_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nchw_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nhwc_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cpu, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cuda, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_enable_and_check, test/test_jit_autocast.py::TestJitTraceAutocast::test_scripted_aliasing 2025-09-07T07:49:16.8783946Z 2025-09-07T07:49:16.8784144Z Running profiler/test_torch_tidy 1/1 ... [2025-09-07 07:49:16.872039] 2025-09-07T07:49:16.8784530Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:49:16.8785465Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_torch_tidy.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:49:16.872471] 2025-09-07T07:49:20.8649760Z 2025-09-07T07:49:20.8650980Z profiler/test_torch_tidy 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_torch_tidy_1.1_d6a47c58ff25e029_.log 2025-09-07T07:49:20.8662922Z Running 22 items in this shard: test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_id_uniqueness, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids_with_other_ops, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocations, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_extra_fields, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_impl_reuse, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_mkldnn_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_module_and_optimizer_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_nnmodule_params, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_adam, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_sgd, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_pointers_and_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_refcounts, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_scalar_ins, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_sparse_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_lists, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_properties, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_full, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_keep_alive, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_scalar_args, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_set 2025-09-07T07:49:20.8672481Z 2025-09-07T07:49:20.8696455Z Running profiler/test_python_tracer 1/1 ... [2025-09-07 07:49:20.865126] 2025-09-07T07:49:20.8697504Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:49:20.8703054Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_python_tracer.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:49:20.870042] 2025-09-07T07:49:24.6446304Z 2025-09-07T07:49:24.6448608Z profiler/test_python_tracer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_python_tracer_1.1_a9c7af3ce10671c6_.log 2025-09-07T07:49:24.6451301Z Running 3 items in this shard: test/profiler/test_python_tracer.py::TestPythonTracer::test_method_with_c_function, test/profiler/test_python_tracer.py::TestPythonTracer::test_monitoring_callback, test/profiler/test_python_tracer.py::TestPythonTracer::test_unexpected_c_return_events 2025-09-07T07:49:24.6455218Z 2025-09-07T07:49:24.6464166Z Running lazy/test_reuse_ir 1/1 ... [2025-09-07 07:49:24.644995] 2025-09-07T07:49:24.6465286Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:49:24.6470109Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_reuse_ir.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:49:24.646761] 2025-09-07T07:49:28.4344225Z 2025-09-07T07:49:28.4345293Z lazy/test_reuse_ir 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_reuse_ir_1.1_d3b63322d4c4b1ff_.log 2025-09-07T07:49:28.4347373Z Running 4 items in this shard: test/lazy/test_reuse_ir.py::TestLazyReuseIr::testAdd, test/lazy/test_reuse_ir.py::TestLazyReuseIr::testAddSub, test/lazy/test_reuse_ir.py::TestLazyReuseIr::testAddSubFallback, test/lazy/test_reuse_ir.py::TestLazyReuseIr::testBatchNorm 2025-09-07T07:49:28.4348778Z 2025-09-07T07:49:28.4349261Z Running test_quantization 1/13 ... [2025-09-07 07:49:28.434433] 2025-09-07T07:49:28.4349859Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:49:28.4352492Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'not serial', '--shard-id=1', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:49:28.434954] 2025-09-07T07:51:07.4777891Z 2025-09-07T07:51:07.4779530Z inductor/test_torchinductor_opinfo 5/12 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_5.12_24d82e4a28d1e1cc_.log 2025-09-07T07:51:07.4923092Z Running 314 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rand___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmatmul___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmatmul___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__native_batch_norm_legit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__native_batch_norm_legit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_alias_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_alias_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_baddbmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bincount_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_left_shift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_not_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdist_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_no_rounding_mode_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_einsum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eq_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eq_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hypot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_inner_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_householder_product_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_rank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logcumsumexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logcumsumexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_tensor_overload_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_log_softmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logsumexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matmul_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_dropout_backward_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_layer_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hinge_embedding_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_static_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_static_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_renorm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsub_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_gaussian_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trunc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_uint8 2025-09-07T07:51:07.5052366Z 2025-09-07T07:51:07.5052550Z Running test_quantization 2/13 ... [2025-09-07 07:51:07.478842] 2025-09-07T07:51:07.5052937Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:51:07.5053890Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'not serial', '--shard-id=2', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:51:07.479349] 2025-09-07T07:52:23.4242796Z 2025-09-07T07:52:23.4243724Z test_decomp 22/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_22.22_2195a5a6b148ce33_.log 2025-09-07T07:52:23.4358503Z Running 431 items in this shard: test/test_decomp.py::TestDecompCUDA::test_bernoulli_default_cuda, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmatmul___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__batch_norm_with_update_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bernoulli_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_einsum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float8_e4m3fn, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_householder_product_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_power_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_singular_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_normalize_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_multinomial_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_linear_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ormqr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polar_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_quantile_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_bartlett_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_real_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cauchy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_max_unpool2d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_stack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_div_floor_rounding_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_div_floor_rounding_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float8_e4m3fnuz, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_gcd_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_gcd_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_embedding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_gelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_nuc_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_nuc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_int8, test/test_decomp.py::DecompOneOffTestsCUDA::test_native_layer_norm_cpu_decomp_cuda, test/test_decomp.py::DecompOneOffTestsCUDA::test_sdpa_nn_functional_scaled_dot_product_attention_cuda_float32 2025-09-07T07:52:23.4464760Z 2025-09-07T07:52:23.4464942Z Running test_quantization 5/13 ... [2025-09-07 07:52:23.425090] 2025-09-07T07:52:23.4465297Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:52:23.4466205Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'not serial', '--shard-id=5', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:52:23.425490] 2025-09-07T07:53:51.2470715Z 2025-09-07T07:53:51.2471959Z test_quantization 5/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_5.13_ef1b2d6b8c7dcc2c_.log 2025-09-07T07:53:51.2506719Z Running 103 items in this shard: test/test_quantization.py::TestQuantizedOps::test_custom_module_multi_head_attention, test/test_quantization.py::TestQuantizedOps::test_int8_batch_norm_onednn, test/test_quantization.py::TestQuantizedOps::test_leaky_relu_observed_output, test/test_quantization.py::TestQuantizedOps::test_max_pool3d_nhwc, test/test_quantization.py::TestQuantizedOps::test_qadd_relu_same_qparams, test/test_quantization.py::TestQuantizedOps::test_quantized_equal, test/test_quantization.py::TestQNNPackOps::test_mean, test/test_quantization.py::TestQNNPackOps::test_qnnpack_tanh, test/test_quantization.py::TestQuantizedLinear::test_qlinear_add_relu_pt2e, test/test_quantization.py::TestQuantizedLinear::test_qlinear_gelu_fp8, test/test_quantization.py::TestQuantizedConv::test_qconv1d, test/test_quantization.py::TestQuantizedConv::test_qconv2d_add, test/test_quantization.py::TestQuantizedConv::test_qconv2d_cudnn, test/test_quantization.py::TestQuantizedConv::test_qconv2d_hardswish_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv2d_hardtanh_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv2d_relu_fp8, test/test_quantization.py::TestQuantizedConv::test_qconv_transpose3d, test/test_quantization.py::TestDynamicQuantizedOps::test_unpacked_qlinear_dynamic_fp16, test/test_quantization.py::TestComparatorOps::test_compare_tensor_tensor, test/test_quantization.py::TestFakeQuantizeOps::test_backward_per_tensor, test/test_quantization.py::TestFakeQuantizeOps::test_forward_per_channel_cachemask_cuda, test/test_quantization.py::TestFakeQuantizeOps::test_forward_per_tensor, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_backward_per_channel_cuda, test/test_quantization.py::TestFusedObsFakeQuant::test_fused_backward_op_fake_quant_off, test/test_quantization.py::TestFusedObsFakeQuant::test_fused_obs_fake_quant_backward_op, test/test_quantization.py::TestQuantizedTensor::test_decomposed_choose_qparams_per_token_asymmetric_backward, test/test_quantization.py::TestQuantizedTensor::test_qtensor_creation, test/test_quantization.py::TestQuantizedTensor::test_qtensor_float_assignment, test/test_quantization.py::TestQuantizedTensor::test_qtensor_quant_dequant, test/test_quantization.py::TestQuantizedTensor::test_qtensor_quantize_per_channel, test/test_quantization.py::TestFakeQuantize::test_fq_module_per_channel, test/test_quantization.py::TestObserver::test_zero_numel, test/test_quantization.py::TestStaticQuantizedModule::test_conv3d_relu_api, test/test_quantization.py::TestDynamicQuantizedModule::test_gru_api, test/test_quantization.py::TestFusedObsFakeQuantModule::test_embedding_bag_qat_config, test/test_quantization.py::TestBackendConfig::test_backend_op_config_set_fused_module, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_single_layer, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_skip_quant, test/test_quantization.py::TestQuantizeEagerOps::test_conv_1d, test/test_quantization.py::TestQuantizeEagerQATNumerics::test_fixed_qparam_ops, test/test_quantization.py::TestQuantizeEagerQATNumerics::test_leaky_relu, test/test_quantization.py::TestModelNumericsEager::test_fake_quant_true_quant_compare, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_stub_partial, test/test_quantization.py::TestNumericSuiteEager::test_compare_weights_linear_dynamic, test/test_quantization.py::TestNumericSuiteEager::test_shadow_logger, test/test_quantization.py::TestEqualizeEager::test_cross_layer_equalization, test/test_quantization.py::TestFuseFx::test_linear_bn_leaky_relu_not_fused_by_default, test/test_quantization.py::TestQuantizeFx::test_convert_custom_config_to_dict, test/test_quantization.py::TestQuantizeFx::test_copy_node_has_shared_actpp_instance, test/test_quantization.py::TestQuantizeFx::test_fp32_input_fp32_output, test/test_quantization.py::TestQuantizeFx::test_permute_nontensor_args_not_observed, test/test_quantization.py::TestQuantizeFx::test_prepared_model_deepcopy, test/test_quantization.py::TestQuantizeFx::test_qconfig_mapping_set_global, test/test_quantization.py::TestQuantizeFx::test_qconfig_mapping_set_object_type, test/test_quantization.py::TestQuantizeFx::test_quantized_input_quantized_output, test/test_quantization.py::TestQuantizeFx::test_reroute_tuple_getitem_patterns, test/test_quantization.py::TestQuantizeFx::test_symmetric_qnnpack_qat_qconfig_mapping, test/test_quantization.py::TestQuantizeFx::test_torch_unsqueeze_nontensor_args_not_observed, test/test_quantization.py::TestQuantizeFxOps::test_add, test/test_quantization.py::TestQuantizeFxOps::test_add_relu_multiple_uses_of_relu, test/test_quantization.py::TestQuantizeFxOps::test_bmm_int_reference, test/test_quantization.py::TestQuantizeFxOps::test_copy_node_fp32_input, test/test_quantization.py::TestQuantizeFxOps::test_quantized_add_qat, test/test_quantization.py::TestQuantizeFxOps::test_silu_reference, test/test_quantization.py::TestQuantizeFxModels::test_torchvision, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_rewriter_multiple_pattern_match, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_writer_replace_consecutive_submodules, test/test_quantization.py::TestGraphUtils::test_customized_equivalet_types_dict, test/test_quantization.py::TestDuplicateDQPass::test_no_add_quant_duplicate_dq, test/test_quantization.py::TestNumericDebugger::test_re_export_preserve_handle, test/test_quantization.py::TestNumericDebugger::test_run_decompositions_map_handle_to_new_nodes, test/test_quantization.py::TestNumericDebugger::test_run_decompositions_same_handle_id, test/test_quantization.py::TestQuantizePT2E::test_fixed_qparams_qspec_ptq, test/test_quantization.py::TestQuantizePT2E::test_preserve_nn_module_stack, test/test_quantization.py::TestPT2ERepresentation::test_dynamic_linear, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_attention_block, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_cat_recipe_same_inputs, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_unary_dynamic, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_set_module_name_and_module_type_with_mixed_configs, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_fold_bn_erases_add_node, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_per_channel_weight_custom_dtype, test/test_quantization.py::TestFXGraphMatcher::test_simple_fusion, test/test_quantization.py::TestFXGraphMatcher::test_simple_mod, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_int8_shadows_int8_fun, test/test_quantization.py::TestFXNumericSuiteNShadows::test_qconfig_multi_mapping_deduplication, test/test_quantization.py::TestFxModelReportDetector::test_multiple_q_config_options, test/test_quantization.py::TestFxModelReportClass::test_equalization_mapping_generation, test/test_quantization.py::TestFxModelReportClass::test_generate_report, test/test_quantization.py::TestEqualizeFx::test_input_weight_equalization_convert, test/test_quantization.py::TestEqualizeFx::test_input_weight_equalization_weights_bias, test/test_quantization.py::TestQuantizeJit::test_observer_with_ignored_function, test/test_quantization.py::TestQuantizeJitOps::test_quantized_cat, test/test_quantization.py::TestQuantizeJitOps::test_quantized_mul, test/test_quantization.py::TestQuantizeJitOps::test_quantized_mul_scalar, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_utils, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_activation, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_functional_modules, test/test_quantization.py::TestAOMigrationNNIntrinsic::test_modules_no_import_nn_intrinsic_quantized_dynamic, test/test_quantization.py::TestAOMigrationQuantizationFx::test_function_import_fx_fusion_patterns, test/test_quantization.py::TestFloat8DtypeCUDA::test_cast_round_trip_soak_cuda_float8_e4m3fn, test/test_quantization.py::TestFloat8DtypeCUDA::test_finfo_cuda_float8_e5m2, test/test_quantization.py::TestFloat8DtypeCUDA::test_finfo_cuda_float8_e8m0fnu, test/test_quantization.py::TestFloat8DtypeCUDA::test_special_numbers_cuda_float8_e5m2 2025-09-07T07:53:51.2533466Z 2025-09-07T07:53:51.2533641Z Running test_quantization 6/13 ... [2025-09-07 07:53:51.247324] 2025-09-07T07:53:51.2534117Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:53:51.2535102Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'not serial', '--shard-id=6', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:53:51.248026] 2025-09-07T07:56:31.0666667Z 2025-09-07T07:56:31.0669152Z test_linalg 2/3 was successful, full logs can be found in artifacts with path test/test-reports/test_linalg_2.3_a1dfaf1a232d12ea_.log 2025-09-07T07:56:31.0787132Z Running 404 items in this shard: test/test_linalg.py::TestLinalgCUDA::test_1_sized_with_0_strided_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_1_sized_with_0_strided_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test__dyn_quant_matmul_4bit_m_1_k_128_n_11008_cuda, test/test_linalg.py::TestLinalgCUDA::test__dyn_quant_matmul_4bit_m_1_k_128_n_4096_cuda, test/test_linalg.py::TestLinalgCUDA::test__dyn_quant_matmul_4bit_m_1_k_64_n_4096_cuda, test/test_linalg.py::TestLinalgCUDA::test__dyn_quant_matmul_4bit_m_32_k_128_n_4096_cuda, test/test_linalg.py::TestLinalgCUDA::test__dyn_quant_matmul_4bit_m_32_k_64_n_11008_cuda, test/test_linalg.py::TestLinalgCUDA::test__dyn_quant_matmul_4bit_m_32_k_64_n_4096_cuda, test/test_linalg.py::TestLinalgCUDA::test__dyn_quant_pack_4bit_weight_k_256_n_64_cuda, test/test_linalg.py::TestLinalgCUDA::test__dyn_quant_pack_4bit_weight_k_64_n_128_cuda, test/test_linalg.py::TestLinalgCUDA::test__dyn_quant_pack_4bit_weight_k_64_n_32_cuda, test/test_linalg.py::TestLinalgCUDA::test__dyn_quant_pack_4bit_weight_k_64_n_48_cuda, test/test_linalg.py::TestLinalgCUDA::test__int8_mm_large_shape_cuda, test/test_linalg.py::TestLinalgCUDA::test__int8_mm_m_32_k_32_n_48_compile_False_slice_False_cuda, test/test_linalg.py::TestLinalgCUDA::test__int8_mm_m_32_k_32_n_48_compile_True_slice_True_cuda, test/test_linalg.py::TestLinalgCUDA::test__int8_mm_m_64_k_32_n_48_compile_False_slice_False_cuda, test/test_linalg.py::TestLinalgCUDA::test__int8_mm_m_64_k_32_n_64_compile_True_slice_False_cuda, test/test_linalg.py::TestLinalgCUDA::test__int8_mm_m_64_k_64_n_48_compile_True_slice_False_cuda, test/test_linalg.py::TestLinalgCUDA::test__int8_mm_m_64_k_64_n_64_compile_False_slice_False_cuda, test/test_linalg.py::TestLinalgCUDA::test__int8_mm_m_64_k_64_n_64_compile_False_slice_True_cuda, test/test_linalg.py::TestLinalgCUDA::test__int8_mm_m_64_k_64_n_64_compile_True_slice_False_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_0_n_16_use_transpose_a_False_use_transpose_b_False_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_0_n_16_use_transpose_a_False_use_transpose_b_True_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_0_n_16_use_transpose_a_True_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_0_n_32_use_transpose_a_False_use_transpose_b_False_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_0_n_32_use_transpose_a_False_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_0_n_32_use_transpose_a_True_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_0_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_16_n_16_use_transpose_a_False_use_transpose_b_False_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_16_n_16_use_transpose_a_False_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_16_n_16_use_transpose_a_False_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_16_n_16_use_transpose_a_False_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_16_n_16_use_transpose_a_True_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_16_n_16_use_transpose_a_True_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_16_n_16_use_transpose_a_True_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_16_n_32_use_transpose_a_True_use_transpose_b_False_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_16_n_32_use_transpose_a_True_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_16_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_32_n_32_use_transpose_a_False_use_transpose_b_True_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_32_n_32_use_transpose_a_False_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_32_n_32_use_transpose_a_True_use_transpose_b_False_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_0_k_32_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_0_n_16_use_transpose_a_False_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_0_n_16_use_transpose_a_True_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_0_n_16_use_transpose_a_True_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_0_n_32_use_transpose_a_False_use_transpose_b_False_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_0_n_32_use_transpose_a_False_use_transpose_b_False_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_0_n_32_use_transpose_a_False_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_0_n_32_use_transpose_a_True_use_transpose_b_False_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_0_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_16_n_16_use_transpose_a_False_use_transpose_b_False_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_16_n_16_use_transpose_a_False_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_16_n_16_use_transpose_a_True_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_16_n_16_use_transpose_a_True_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_16_n_32_use_transpose_a_False_use_transpose_b_False_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_16_n_32_use_transpose_a_True_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_16_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_32_n_16_use_transpose_a_False_use_transpose_b_False_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_32_n_16_use_transpose_a_False_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_32_n_16_use_transpose_a_True_use_transpose_b_False_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_32_n_32_use_transpose_a_False_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_32_n_32_use_transpose_a_False_use_transpose_b_True_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_17_k_32_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_0_n_16_use_transpose_a_True_use_transpose_b_False_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_0_n_16_use_transpose_a_True_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_0_n_16_use_transpose_a_True_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_0_n_32_use_transpose_a_False_use_transpose_b_False_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_0_n_32_use_transpose_a_False_use_transpose_b_True_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_0_n_32_use_transpose_a_False_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_0_n_32_use_transpose_a_True_use_transpose_b_False_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_0_n_32_use_transpose_a_True_use_transpose_b_False_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_0_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_0_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_16_n_16_use_transpose_a_False_use_transpose_b_False_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_16_n_16_use_transpose_a_False_use_transpose_b_True_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_16_n_16_use_transpose_a_True_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_16_n_32_use_transpose_a_False_use_transpose_b_False_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_16_n_32_use_transpose_a_False_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_16_n_32_use_transpose_a_True_use_transpose_b_False_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_16_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_16_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_32_n_16_use_transpose_a_False_use_transpose_b_False_non_contig_type_0_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_32_n_16_use_transpose_a_False_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_32_n_16_use_transpose_a_True_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_32_n_32_use_transpose_a_False_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_32_n_32_use_transpose_a_False_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_32_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_1_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_cpu_m_8_k_32_n_32_use_transpose_a_True_use_transpose_b_True_non_contig_type_2_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_16_n_16_use_transpose_a_False_use_transpose_b_False_cuda, test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_32_n_32_use_transpose_a_True_use_transpose_b_False_cuda, test/test_linalg.py::TestLinalgCUDA::test_addbmm_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_addbmm_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_addmm_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_addmm_gelu_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_gelu_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_False_alpha_0_0_beta_0_5_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_False_alpha_0_0_beta_0_5_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_False_alpha_0_2_beta_0_0_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_False_alpha_0_2_beta_0_5_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_False_alpha_0_2_beta_1_0_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_False_alpha_1_0_beta_0_0_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_False_alpha_1_0_beta_0_0_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_False_alpha_1_0_beta_0_5_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_0_beta_0_0_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_0_beta_0_0_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_0_beta_0_0_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_0_beta_0_5_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_0_beta_0_5_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_0_beta_1_0_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_2_beta_0_0_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_2_beta_0_0_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_2_beta_1_0_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_0_2_beta_1_0_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_1_0_beta_0_0_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_1_0_beta_0_5_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_1_0_beta_0_5_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_1_0_beta_1_0_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_False_transpose_b_True_alpha_1_0_beta_1_0_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_False_alpha_0_0_beta_0_0_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_False_alpha_0_0_beta_1_0_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_False_alpha_0_2_beta_0_5_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_False_alpha_1_0_beta_0_0_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_False_alpha_1_0_beta_0_0_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_0_beta_0_0_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_0_beta_0_5_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_0_beta_0_5_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_0_beta_1_0_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_0_beta_1_0_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_2_beta_0_0_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_2_beta_0_0_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_2_beta_0_5_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_2_beta_1_0_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_0_2_beta_1_0_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_1_0_beta_0_5_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_1_0_beta_0_5_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_1_0_beta_0_5_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_mv_transpose_a_True_transpose_b_True_alpha_1_0_beta_1_0_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_relu_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmm_relu_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_addmm_relu_tunableop_rocm_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmm_sizes_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_addmv_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addmv_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_addmv_rowmajor_colmajor_incx_incy_lda_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_addr_bool_cuda_bool, test/test_linalg.py::TestLinalgCUDA::test_addr_float_and_complex_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_addr_float_and_complex_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_addr_float_and_complex_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_baddbmm_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_baddbmm_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_baddbmm_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_baddbmm_input_dtypes_compatibility_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_baddbmm_input_dtypes_compatibility_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_baddbmm_input_dtypes_compatibility_cuda_int16, test/test_linalg.py::TestLinalgCUDA::test_baddbmm_input_dtypes_compatibility_cuda_int64, test/test_linalg.py::TestLinalgCUDA::test_baddbmm_nan_input_with_zero_beta_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_blas_alpha_beta_empty_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_blas_alpha_beta_empty_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_blas_alpha_beta_empty_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_blas_nan_out_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_blas_nan_out_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_blas_nan_out_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_blaslog_tunableop_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_bmm_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_bmm_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_call_count_tunableop_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_cholesky_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_cholesky_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_cholesky_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_cholesky_errors_and_warnings_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_cholesky_ex_non_pd_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_cholesky_inverse_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_cholesky_inverse_errors_and_warnings_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_cholesky_inverse_errors_and_warnings_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_cholesky_inverse_errors_and_warnings_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_cholesky_inverse_errors_and_warnings_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_cholesky_solve_batched_broadcasting_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_cholesky_solve_batched_broadcasting_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_cholesky_solve_batched_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_cholesky_solve_batched_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_cholesky_solve_batched_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_cholesky_solve_batched_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_cholesky_solve_out_errors_and_warnings_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_cholesky_solve_out_errors_and_warnings_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_compile_dyn_quant_matmul_4bit_m_1_k_64_n_11008_cuda, test/test_linalg.py::TestLinalgCUDA::test_compile_dyn_quant_matmul_4bit_m_32_k_128_n_11008_cuda, test/test_linalg.py::TestLinalgCUDA::test_compile_int4_mm_m_64_k_64_n_48_cuda, test/test_linalg.py::TestLinalgCUDA::test_compile_int4_mm_m_64_k_64_n_64_cuda, test/test_linalg.py::TestLinalgCUDA::test_cond_errors_and_warnings_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_corner_cases_of_cublasltmatmul_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_corner_cases_of_cublasltmatmul_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_cross_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_cross_error_cuda, test/test_linalg.py::TestLinalgCUDA::test_det_logdet_slogdet_batched_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_det_logdet_slogdet_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_dot_invalid_args_cuda, test/test_linalg.py::TestLinalgCUDA::test_dot_vs_numpy_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_dot_vs_numpy_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_dump_results_on_exit_tunableop_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_eig_check_magma_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_eig_compare_backends_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_eig_errors_and_warnings_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_eig_errors_and_warnings_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_eig_errors_and_warnings_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_eig_removed_error_cuda, test/test_linalg.py::TestLinalgCUDA::test_eig_with_nan_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_eig_with_nan_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_eigh_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_eigh_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_eigh_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_eigh_errors_and_warnings_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_eigh_lower_uplo_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_eigh_lwork_lapack_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_eigh_lwork_lapack_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_eigh_svd_illcondition_matrix_input_should_not_crash_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_eigh_svd_illcondition_matrix_input_should_not_crash_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_eigvals_compare_backends_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_eigvals_errors_and_warnings_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_eigvals_errors_and_warnings_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_eigvalsh_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_eigvalsh_errors_and_warnings_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_eigvalsh_errors_and_warnings_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_einsum_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_fp16_mv_transposed_first_argument_arm_cpu_m_32_k_35_cuda, test/test_linalg.py::TestLinalgCUDA::test_fp16_mv_transposed_first_argument_arm_cpu_m_32_k_40_cuda, test/test_linalg.py::TestLinalgCUDA::test_fp16_mv_transposed_first_argument_arm_cpu_m_32_k_64_cuda, test/test_linalg.py::TestLinalgCUDA::test_fp16_mv_transposed_first_argument_arm_cpu_m_35_k_32_cuda, test/test_linalg.py::TestLinalgCUDA::test_fp16_mv_transposed_first_argument_arm_cpu_m_35_k_36_cuda, test/test_linalg.py::TestLinalgCUDA::test_fp16_mv_transposed_first_argument_arm_cpu_m_36_k_36_cuda, test/test_linalg.py::TestLinalgCUDA::test_fp16_mv_transposed_first_argument_arm_cpu_m_64_k_64_cuda, test/test_linalg.py::TestLinalgCUDA::test_geqrf_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_householder_product_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_inner_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_inv_errors_and_warnings_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_inv_errors_and_warnings_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_inv_errors_and_warnings_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_inv_ex_info_device_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_inv_ex_info_device_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_inv_ex_singular_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_invariance_error_spectral_decompositions_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_inverse_errors_large_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_inverse_many_batches_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_kron_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_kron_empty_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_kron_empty_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_kron_errors_and_warnings_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_kron_errors_and_warnings_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_lapack_empty_cuda, test/test_linalg.py::TestLinalgCUDA::test_ldl_factor_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_ldl_factor_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_ldl_solve_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_ldl_solve_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_ldl_solve_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_linalg_cross_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_linalg_cross_with_and_without_dim_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_linalg_lstsq_batch_broadcasting_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_linalg_lstsq_input_checks_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_linalg_lu_cpu_errors_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_linalg_lu_solve_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_linalg_lu_solve_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_linalg_matrix_exp_analytic_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_linalg_matrix_exp_batch_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_linalg_matrix_exp_boundary_cases_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_linalg_matrix_exp_compare_with_taylor_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_linalg_matrix_exp_no_warnings_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_linalg_matrix_exp_perverse_nan_values_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_linalg_matrix_exp_perverse_nan_values_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_linalg_solve_triangular_broadcasting_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_linalg_solve_triangular_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_linalg_solve_triangular_large_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_linear_algebra_scalar_raises_cuda, test/test_linalg.py::TestLinalgCUDA::test_lobpcg_basic_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_lu_solve_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_lu_solve_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_lu_solve_large_matrices_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_matmul_check_entries_tunableop_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_matmul_mv_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_matmul_offline_tunableop_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_matmul_small_brute_force_1d_Nd_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_matrix_power_non_negative_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_matrix_rank_atol_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_matrix_rank_atol_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_matrix_rank_atol_rtol_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_matrix_rank_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_matrix_rank_empty_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_matrix_rank_out_errors_and_warnings_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_mm_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_mm_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_mm_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_mm_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_mm_submatrix_offline_tunableop_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_multi_dot_errors_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_norm_complexhalf_cuda, test/test_linalg.py::TestLinalgCUDA::test_norm_dtype_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_norm_dtype_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_norm_dtype_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_norm_errors_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_norm_errors_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_norm_fastpaths_cuda, test/test_linalg.py::TestLinalgCUDA::test_norm_fused_type_promotion_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_norm_matrix_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_norm_vector_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_norm_vector_degenerate_shapes_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_norm_vector_degenerate_shapes_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_nuclear_norm_axes_small_brute_force_old_cuda, test/test_linalg.py::TestLinalgCUDA::test_nuclear_norm_out_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_numeric_check_leak_tunableop_rocm_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_old_cholesky_batched_upper_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_old_cholesky_batched_upper_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_old_cholesky_empty_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_ormqr_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_ormqr_errors_and_warnings_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_outer_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_outer_cuda_float16, test/test_linalg.py::TestLinalgCUDA::test_outer_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_outer_cuda_int8, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_bfloat16_float16, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_bfloat16_float32, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_bfloat16_int16, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_bfloat16_uint8, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_bool_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_bool_complex64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_bool_int16, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_bool_int32, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_complex128_complex128, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_complex128_complex64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_complex128_int32, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_complex128_int64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_complex64_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_complex64_complex64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_complex64_float64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_complex64_int32, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float16_int16, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float16_int64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float16_uint8, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float32_bool, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float32_float64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float32_int16, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float32_uint8, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float64_bool, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float64_float16, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float64_float64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float64_int16, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_float64_int32, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int16_complex64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int16_float32, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int16_float64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int16_int32, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int32_bool, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int32_complex128, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int32_complex64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int32_float16, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int32_float32, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int32_float64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int32_int64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int32_int8, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int64_complex128, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int64_int32, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int8_bool, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int8_complex128, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_int8_int8, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_uint8_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_uint8_complex64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_uint8_int64, test/test_linalg.py::TestLinalgCUDA::test_outer_type_promotion_cuda_uint8_int8, test/test_linalg.py::TestLinalgCUDA::test_permute_matmul_cuda, test/test_linalg.py::TestLinalgCUDA::test_pinv_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_pinverse_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_pinverse_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_qr_batched_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_qr_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_qr_vs_numpy_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_scaled_gemm_offline_tunableop_cuda_float8_e4m3fnuz, test/test_linalg.py::TestLinalgCUDA::test_slogdet_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_slogdet_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_solve_batched_broadcasting_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_solve_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_strided_mm_bmm_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_tensordot_out_kernel_errors_with_autograd_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_tensordot_out_kernel_errors_with_autograd_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_tensorinv_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_tensorinv_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_tensorinv_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_tensorinv_empty_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_tensorinv_empty_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_tensorinv_empty_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_tensorinv_errors_and_warnings_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_tensorinv_singular_input_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_tensorinv_singular_input_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_tensorinv_singular_input_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_tensorsolve_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_tensorsolve_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_tensorsolve_empty_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_tensorsolve_errors_and_warnings_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_triangular_solve_batched_broadcasting_cuda_complex64, test/test_linalg.py::TestLinalgCUDA::test_triangular_solve_batched_many_batches_cuda_complex128, test/test_linalg.py::TestLinalgCUDA::test_triangular_solve_batched_many_batches_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_triangular_solve_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_triangular_solve_large_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_triangular_solve_out_errors_and_warnings_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_vdot_vs_numpy_cuda_float32, test/test_linalg.py::TestLinalgCUDA::test_vector_norm_cuda_float32 2025-09-07T07:56:31.0912918Z 2025-09-07T07:56:31.0913189Z Running test_quantization 9/13 ... [2025-09-07 07:56:31.067868] 2025-09-07T07:56:31.0913581Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:56:31.0914492Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'not serial', '--shard-id=9', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:56:31.068732] 2025-09-07T07:57:08.3265716Z 2025-09-07T07:57:08.3267454Z test_sparse_csr 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_csr_1.1_1342331952eac7ae_.log 2025-09-07T07:57:08.5083437Z Running 4958 items in this shard: test/test_sparse_csr.py::TestSparseCSRSampler::test_make_crow_indices, test/test_sparse_csr.py::TestSparseCSRCUDA::test_add_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_add_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_add_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_add_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_dense_result_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_dense_result_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_dense_result_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_dense_result_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_dense_result_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_dense_result_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_dense_result_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_dense_result_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_errors_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_11x9_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_11x9_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_11x9_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_11x9_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_3x3_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_3x3_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_3x3_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_3x3_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_5x7_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_5x7_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_5x7_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_5x7_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_dense_output_addmm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_dense_output_addmv_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_dense_output_mm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_dense_output_mv_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_abs_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_abs_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_angle_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_angle_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_asin_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_asinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_asinh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_atan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_atanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_atanh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_ceil_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_deg2rad_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_erf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_erfinv_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_floor_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isinf_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isinf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isnan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isneginf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isposinf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_log1p_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_neg_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_neg_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_positive_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_rad2deg_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_round_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sgn_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sign_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_signbit_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sin_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sin_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sinh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sqrt_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sqrt_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_baddbmm_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_baddbmm_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_baddbmm_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_baddbmm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_bmm_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_bmm_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_bmm_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_bmm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseBSC_SparseBSC_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseBSC_SparseBSR_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseBSC_SparseCSC_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseBSC_SparseCSR_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseBSR_SparseBSC_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseBSR_SparseBSR_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseBSR_SparseCSC_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseBSR_SparseCSR_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSC_SparseBSC_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSC_SparseBSR_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSC_SparseCSC_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSC_SparseCSR_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSR_SparseBSC_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSR_SparseBSR_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSR_SparseCSC_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSR_SparseCSR_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_to_csr_convert_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_double_to_sparse_csr_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_is_contiguous_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_matvec_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_matvec_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_matvec_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_matvec_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_matvec_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_matvec_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_nnz_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_storage_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_stride_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_to_block_csr_blocksize_2_cuda_float64_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_to_block_csr_blocksize_2_cuda_float64_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_to_block_csr_blocksize_4_cuda_float64_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_to_block_csr_blocksize_4_cuda_float64_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_to_block_csr_errors_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseBSC_Batched_Hybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseBSC_Batched_NonHybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseBSC_NonBatched_Hybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseBSC_NonBatched_NonHybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseBSR_Batched_Hybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseBSR_Batched_NonHybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseBSR_NonBatched_Hybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseBSR_NonBatched_NonHybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSC_Batched_Hybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSC_Batched_NonHybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSC_NonBatched_Hybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSC_NonBatched_NonHybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSR_Batched_Hybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSR_Batched_NonHybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSR_NonBatched_Hybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSR_NonBatched_NonHybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_linalg_solve_sparse_csr_cusolver_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_linalg_solve_sparse_csr_cusolver_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_matmul_device_mismatch_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mm_errors_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_autograd_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_autograd_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_autograd_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_autograd_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_errors_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_errors_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_errors_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_errors_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_zero_sized_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_zero_sized_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_zero_sized_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_zero_sized_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_errors_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_errors_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_errors_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_errors_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_addmm_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_addmm_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_addmm_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_addmm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_frac_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_frac_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_frac_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_nn_functional_relu_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_nn_functional_relu_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_nn_functional_relu_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_nn_functional_relu_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sign_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sign_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sign_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sign_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sign_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sign_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sign_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sign_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sign_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sign_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_frac_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_frac_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_frac_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_sum_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_sum_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_to_sparse_compressed_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_to_sparse_compressed_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_to_sparse_compressed_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_to_sparse_compressed_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_triangular_solve_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_triangular_solve_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_triangular_solve_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_triangular_solve_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_frac_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_frac_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_frac_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_frac_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_frac_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_frac_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_mean_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_mean_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_mean_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_mean_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_mean_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_mean_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_neg_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_nn_functional_relu_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_nn_functional_relu_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_nn_functional_relu_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_nn_functional_relu_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_nn_functional_relu_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_nn_functional_relu_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_nn_functional_relu_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_nn_functional_relu_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_round_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_round_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_round_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_round_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_round_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_round_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_round_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_round_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_round_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_frac_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_frac_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_frac_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_nn_functional_relu_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_nn_functional_relu_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_nn_functional_relu_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_nn_functional_relu_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_nn_functional_relu_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_nn_functional_relu_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_nn_functional_relu_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_nn_functional_relu_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_frac_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_frac_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_frac_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amax_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amax_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amax_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amax_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amax_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amax_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amax_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amax_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amax_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_mean_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_mean_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_mean_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_mean_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_mean_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_mean_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_frac_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_frac_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_frac_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_mean_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_mean_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_mean_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_mean_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_mean_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_mean_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_randn_like_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_randn_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_randn_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_randn_like_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_randn_like_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_randn_like_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_randn_like_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_signbit_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_signbit_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_signbit_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_signbit_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_signbit_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_signbit_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_signbit_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_signbit_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_signbit_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_signbit_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_dim_SparseBSC_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_dim_SparseBSR_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_dim_SparseCSC_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_dim_SparseCSR_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseBSC_target_sparse_compressed_tensor_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseBSC_target_sparse_compressed_tensor_no_size_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseBSC_target_validate_sparse_compressed_tensor_args_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseBSR_target_sparse_compressed_tensor_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseBSR_target_sparse_compressed_tensor_no_size_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseBSR_target_validate_sparse_compressed_tensor_args_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseCSC_target_sparse_compressed_tensor_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseCSC_target_sparse_compressed_tensor_no_size_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseCSC_target_validate_sparse_compressed_tensor_args_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseCSR_target_sparse_compressed_tensor_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseCSR_target_sparse_compressed_tensor_no_size_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseCSR_target_validate_sparse_compressed_tensor_args_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_csr_large_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_layout_SparseBSC_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_layout_SparseBSR_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_layout_SparseCSC_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_layout_SparseCSR_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_pickle_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_pickle_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_pickle_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_pickle_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_print_SparseBSC_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_print_SparseBSR_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_print_SparseCSC_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_print_SparseCSR_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_TensorAsKey_cuda, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_addmm_meta_cuda, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_16_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_16_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_16_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_16_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_16_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_16_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_64_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_64_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_64_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_64_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_64_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_64_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_error_messages_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_16_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_16_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_16_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_16x32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_16x32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_16x32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_2_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_2_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_2_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_2x3_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_2x3_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_2x3_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_softmax_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_softmax_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_softmax_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_16_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_16_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_16_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_16_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_16_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_16_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scatter_mm_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scatter_mm_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scatter_mm_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_unspecified_cuda_int8 2025-09-07T07:57:08.6677020Z 2025-09-07T07:57:08.6677234Z Running test_quantization 10/13 ... [2025-09-07 07:57:08.351146] 2025-09-07T07:57:08.6677603Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:57:08.6678526Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'not serial', '--shard-id=10', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:57:08.352006] 2025-09-07T07:58:35.5361422Z 2025-09-07T07:58:35.5362410Z test_quantization 10/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_10.13_cf7fe0b85d22a028_.log 2025-09-07T07:58:35.5397914Z Running 110 items in this shard: test/test_quantization.py::TestQuantizedOps::test_linear_bias_unpack, test/test_quantization.py::TestQuantizedOps::test_qadd_relu_cudnn_nhwc, test/test_quantization.py::TestQNNPackOps::test_avg_pool2d, test/test_quantization.py::TestQNNPackOps::test_qnnpack_mul, test/test_quantization.py::TestQuantizedLinear::test_qlinear_gelu_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv1d_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv2d_sum_relu_float_output_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv3d, test/test_quantization.py::TestDynamicQuantizedOps::test_unpacked_qlinear_dynamic_fp16_opcheck, test/test_quantization.py::TestComparatorOps::test_compare_tensor_scalar, test/test_quantization.py::TestFakeQuantizeOps::test_fake_quantize_per_channel_affine_scale_dtypes, test/test_quantization.py::TestFakeQuantizeOps::test_forward_backward_per_tensor_with_amp, test/test_quantization.py::TestFakeQuantizeOps::test_forward_per_channel_cachemask_cpu, test/test_quantization.py::TestFakeQuantizeOps::test_fq_module_per_tensor, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_backward_per_tensor_cuda, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_forward_per_channel_cuda, test/test_quantization.py::TestQuantizedTensor::test_qtensor_dtypes, test/test_quantization.py::TestQuantizedTensor::test_qtensor_fill_per_channel, test/test_quantization.py::TestQuantizedTensor::test_qtensor_fill_per_channel_nhwc, test/test_quantization.py::TestQuantizedTensor::test_qtensor_fill_per_tensor, test/test_quantization.py::TestObserver::test_per_tensor_observers, test/test_quantization.py::TestObserver::test_save_load_state_dict_script, test/test_quantization.py::TestStaticQuantizedModule::test_conv2d_relu_api, test/test_quantization.py::TestStaticQuantizedModule::test_conv3d_api, test/test_quantization.py::TestDynamicQuantizedModule::test_dynamic_conv2d, test/test_quantization.py::TestFusedObsFakeQuantModule::test_embedding_qat_config, test/test_quantization.py::TestFusedObsFakeQuantModule::test_fused_mod_reduce_range, test/test_quantization.py::TestUtils::test_quantize_weight_clamping_per_channel, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_nested1, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_nested2, test/test_quantization.py::TestQuantizeEagerOps::test_conv_transpose_1d, test/test_quantization.py::TestQuantizeEagerOps::test_conv_transpose_2d, test/test_quantization.py::TestQuantizeEagerQAT::test_train_save_load_eval, test/test_quantization.py::TestQuantizeEagerQATNumerics::test_conv_bn_relu, test/test_quantization.py::TestFuseEager::test_fusion_conv_with_bias, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_stub_functional_static, test/test_quantization.py::TestNumericSuiteEager::test_compare_weights_conv_static, test/test_quantization.py::TestFuseFx::test_problematic_fuse_example, test/test_quantization.py::TestQuantizeFx::test__convert_to_reference_decomposed_fx_dynamic_quant, test/test_quantization.py::TestQuantizeFx::test_backend_config_quantization_range, test/test_quantization.py::TestQuantizeFx::test_channel_shuffle_lowering, test/test_quantization.py::TestQuantizeFx::test_conv_transpose_not_reference, test/test_quantization.py::TestQuantizeFx::test_conv_transpose_relu_reference, test/test_quantization.py::TestQuantizeFx::test_deepcopy_preserve_attributes, test/test_quantization.py::TestQuantizeFx::test_default_qconfig_mapping_override_global, test/test_quantization.py::TestQuantizeFx::test_dynamic_quant_fp16, test/test_quantization.py::TestQuantizeFx::test_fuse_custom_config_to_dict, test/test_quantization.py::TestQuantizeFx::test_get_executorch_backend_config, test/test_quantization.py::TestQuantizeFx::test_lowering_functional_linear_with_kwargs, test/test_quantization.py::TestQuantizeFx::test_prepare_custom_config_set_standalone_module_class, test/test_quantization.py::TestQuantizeFx::test_preserve_attributes, test/test_quantization.py::TestQuantizeFx::test_preserve_qconfig, test/test_quantization.py::TestQuantizeFx::test_qat_and_script, test/test_quantization.py::TestQuantizeFx::test_qconfig_dict_with_fused_modules, test/test_quantization.py::TestQuantizeFx::test_qconfig_for_call_method, test/test_quantization.py::TestQuantizeFxOps::test_cat, test/test_quantization.py::TestQuantizeFxOps::test_conv_transpose_1d, test/test_quantization.py::TestQuantizeFxOps::test_fixed_qparams_ops_qint8, test/test_quantization.py::TestQuantizeFxOps::test_linear_module, test/test_quantization.py::TestQuantizeFxOps::test_rnn_cell, test/test_quantization.py::TestQuantizeFxModels::test_prepare_serialize_switch_device_convert, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_rewriter_graph_argument_order, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_rewriter_single_pattern_match, test/test_quantization.py::TestQuantizePT2E::test_allow_exported_model_train_eval_idempotent, test/test_quantization.py::TestQuantizePT2E::test_conv_transpose_bn_relu, test/test_quantization.py::TestQuantizePT2E::test_quantization_dtype_bfloat16_float8_e4m3fn, test/test_quantization.py::TestQuantizePT2E::test_quantization_dtype_float32_float8_e4m3fn, test/test_quantization.py::TestQuantizePT2E::test_quantization_dtype_float32_float8_e5m2, test/test_quantization.py::TestQuantizePT2EAffineQuantization::test_channel_group_quantization, test/test_quantization.py::TestQuantizePT2EAffineQuantization::test_dynamic_per_tok_act_per_group_weights, test/test_quantization.py::TestPT2ERepresentation::test_conv2d, test/test_quantization.py::TestXNNPACKQuantizer::test_propagate_annotation, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_flatten_recipe, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_binary2, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_set_module_name_and_module_type_case2, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_conv_transpose_bn_relu, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_preserve_source_fn_stack, test/test_quantization.py::TestFXGraphMatcher::test_simple_fun, test/test_quantization.py::TestFXGraphMatcher::test_simple_tensor_ops, test/test_quantization.py::TestFXGraphMatcher::test_user_defined_function, test/test_quantization.py::TestFXGraphMatcherModels::test_mobilenet_v2_qat, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_linear_fun_ptq, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_int8_shadows_fp32_simple, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_shadow_activations_fqn, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_user_defined_function, test/test_quantization.py::TestFXNumericSuiteNShadows::test_add_loggers_functions, test/test_quantization.py::TestFXNumericSuiteNShadows::test_conv_bn_relu_mod, test/test_quantization.py::TestFXNumericSuiteNShadows::test_qconfig_multi_mapping_from_list, test/test_quantization.py::TestFXNumericSuiteCoreAPIsModels::test_compare_activations_linear, test/test_quantization.py::TestFXNumericSuiteCoreAPIsModels::test_compare_shadow_activations_linear, test/test_quantization.py::TestFXNumericSuiteCoreAPIsModels::test_compare_weights_linear, test/test_quantization.py::TestFxModelReportDetector::test_sequential_model_format, test/test_quantization.py::TestFxModelReportClass::test_prepare_model_callibration, test/test_quantization.py::TestFxDetectOutliers::test_no_outlier_report_gen, test/test_quantization.py::TestFxDetectOutliers::test_outlier_detection_determine_points, test/test_quantization.py::TestEqualizeFx::test_input_weight_equalization_activation_values, test/test_quantization.py::TestQuantizeJit::test_conv_bn, test/test_quantization.py::TestQuantizeJitPasses::test_foldbn_shared_classtype, test/test_quantization.py::TestQuantizeJitPasses::test_replicate_dequantize_in_block, test/test_quantization.py::TestQuantizeJitOps::test_cat_linear, test/test_quantization.py::TestQuantizeJitOps::test_linear, test/test_quantization.py::TestQuantizeJitOps::test_qbatch_norm_relu_BNFuncInplaceRelu, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_quantization_mappings, test/test_quantization.py::TestAOMigrationNNQuantized::test_import_nn_qat_dynamic_linear, test/test_quantization.py::TestAOMigrationNNQuantized::test_import_nn_quantizable_activation, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_batchnorm, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_dropout, test/test_quantization.py::TestFloat8DtypeCUDA::test_cast_round_trip_rte_cuda_float8_e8m0fnu, test/test_quantization.py::TestFloat8DtypeCUDA::test_empty_cuda_float8_e4m3fn, test/test_quantization.py::TestFloat8DtypeCUDA::test_finfo_cuda_float8_e4m3fn 2025-09-07T07:58:35.5426978Z 2025-09-07T07:58:35.5427305Z Running test_quantization 13/13 ... [2025-09-07 07:58:35.536127] 2025-09-07T07:58:35.5427763Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:58:35.5428751Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'not serial', '--shard-id=13', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:58:35.536487] 2025-09-07T08:01:20.1108944Z 2025-09-07T08:01:20.1110088Z test_quantization 13/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_13.13_d2770acf8155535e_.log 2025-09-07T08:01:20.1202787Z Running 89 items in this shard: test/test_quantization.py::TestQuantizedOps::test_adaptive_avg_pool, test/test_quantization.py::TestQuantizedOps::test_channel_shuffle, test/test_quantization.py::TestQuantizedOps::test_hardtanh, test/test_quantization.py::TestQuantizedOps::test_max_pool3d, test/test_quantization.py::TestQuantizedOps::test_qprelu, test/test_quantization.py::TestQuantizedOps::test_qsoftmax, test/test_quantization.py::TestQuantizedOps::test_quantized_mean_qnnpack, test/test_quantization.py::TestQuantizedLinear::test_qlinear_unpack, test/test_quantization.py::TestQuantizedConv::test_qconv2d_sum_relu_fp8, test/test_quantization.py::TestDynamicQuantizedOps::test_dynamic_conv2d, test/test_quantization.py::TestDynamicQuantizedOps::test_linear_prepack_fp16_numerics, test/test_quantization.py::TestFakeQuantizeOps::test_backward_per_channel, test/test_quantization.py::TestFakeQuantizeOps::test_fake_quant_control, test/test_quantization.py::TestFakeQuantizeOps::test_fake_quantize_per_tensor_affine_inf, test/test_quantization.py::TestFakeQuantizeOps::test_fixed_qparams_fq_module, test/test_quantization.py::TestFakeQuantizeOps::test_forward_per_tensor_cachemask_cuda, test/test_quantization.py::TestQuantizedTensor::test_compare_per_tensor_device_numerics, test/test_quantization.py::TestQuantizedTensor::test_decomposed_quantize_per_tensor, test/test_quantization.py::TestQuantizedTensor::test_fp16_saturate_op, test/test_quantization.py::TestQuantizedTensor::test_per_channel_qtensor_creation_cuda, test/test_quantization.py::TestQuantizedTensor::test_qtensor_index_select_cuda, test/test_quantization.py::TestQuantizedTensor::test_qtensor_masked_fill_cpu, test/test_quantization.py::TestQuantizedTensor::test_qtensor_per_channel_permute, test/test_quantization.py::TestQuantizedTensor::test_qtensor_permute, test/test_quantization.py::TestObserver::test_histogram_observer_handle_close_to_infinity, test/test_quantization.py::TestDynamicQuantizedModule::test_dynamic_convtranspose2d, test/test_quantization.py::TestFusedObsFakeQuantModule::test_fused_obs_fq_module, test/test_quantization.py::TestBackendConfig::test_backend_op_config_set_num_tensor_args_to_observation_type, test/test_quantization.py::TestUtils::test_get_fqn_to_example_inputs_default_kwargs, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_dequant_stub, test/test_quantization.py::TestQuantizeEagerOps::test_functional_module, test/test_quantization.py::TestQuantizeEagerQAT::test_manual, test/test_quantization.py::TestFuseEager::test_fuse_function_customization, test/test_quantization.py::TestFuseEager::test_fusion_linear_bn_eval, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_outputs_linear_dynamic, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_outputs_linear_static, test/test_quantization.py::TestFuseFx::test_fuse_conv_bn_add_relu_onednn, test/test_quantization.py::TestFuseFx::test_fuse_module_relu, test/test_quantization.py::TestQuantizeFx::test_backend_config_check_for_weight_and_bias, test/test_quantization.py::TestQuantizeFx::test_change_backend_config_for_fixed_qparam_ops, test/test_quantization.py::TestQuantizeFx::test_convert_custom_config_set_observed_to_quantized_mapping, test/test_quantization.py::TestQuantizeFx::test_fp32_input_quantized_output, test/test_quantization.py::TestQuantizeFx::test_getattr_with_nontensor_result, test/test_quantization.py::TestQuantizeFx::test_qconfig_mapping_set_module_name_object_type_order, test/test_quantization.py::TestQuantizeFx::test_qconfig_module_name_regex, test/test_quantization.py::TestQuantizeFx::test_qconfig_none, test/test_quantization.py::TestQuantizeFx::test_sequential, test/test_quantization.py::TestQuantizeFx::test_size_nontensor_args_not_observed, test/test_quantization.py::TestQuantizeFxOps::test_boolean_tensor, test/test_quantization.py::TestQuantizeFxOps::test_chunk, test/test_quantization.py::TestQuantizeFxOps::test_int8_input_no_unnecessary_fq, test/test_quantization.py::TestQuantizeFxOps::test_layer_norm, test/test_quantization.py::TestQuantizeFxOps::test_pixel_shuffle, test/test_quantization.py::TestQuantizeFxOps::test_reshape_fp16, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_rewriter_preserves_logic, test/test_quantization.py::TestMetaDataPorting::test_metadata_porting_for_dq, test/test_quantization.py::TestMetaDataPorting::test_metadata_porting_for_two_dq, test/test_quantization.py::TestNumericDebugger::test_added_node_gets_unique_id, test/test_quantization.py::TestNumericDebugger::test_extract_results_from_loggers_list_output, test/test_quantization.py::TestQuantizePT2E::test_composable_quantizer_linear_conv, test/test_quantization.py::TestQuantizePT2E::test_constant_prop_preserve_metadata, test/test_quantization.py::TestQuantizePT2E::test_derived_qspec_per_channel, test/test_quantization.py::TestQuantizePT2E::test_move_exported_model_dropout, test/test_quantization.py::TestQuantizePT2E::test_simple_quantizer, test/test_quantization.py::TestPT2ERepresentation::test_maxpool2d, test/test_quantization.py::TestXNNPACKQuantizer::test_add_mul_scalar, test/test_quantization.py::TestXNNPACKQuantizer::test_dynamic_linear, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_qat_conv2d, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_conv_bn_fusion_literal_args, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_conv_transpose_bn, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_bn_relu_fusion_cuda, test/test_quantization.py::TestFXGraphMatcher::test_nodes_with_equal_types_get_matched, test/test_quantization.py::TestFXGraphMatcher::test_simple_mod_multi, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_fqn, test/test_quantization.py::TestFXNumericSuiteNShadows::test_add_loggers_conv_bn_relu_fusion_quant, test/test_quantization.py::TestFXNumericSuiteNShadows::test_add_loggers_linear_mod_quant_quant, test/test_quantization.py::TestFXNumericSuiteNShadows::test_functions, test/test_quantization.py::TestSerialization::test_linear_relu_package_quantization_transforms, test/test_quantization.py::TestQuantizeJitPasses::test_insert_observers_interface, test/test_quantization.py::TestQuantizeJitPasses::test_insert_quant_dequant_shared_class_type, test/test_quantization.py::TestQuantizeJitPasses::test_skip_dequant_constant_prop, test/test_quantization.py::TestQuantizeJitOps::test_dequantize_tuple, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_prepare_dynamic, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_prepare_dynamic_child_qconfig, test/test_quantization.py::TestQuantizeDynamicJitOps::test_embedding_bag_padding_idx_error, test/test_quantization.py::TestDeprecatedJitQuantized::test_erase_class_tensor_shapes, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_qconfig, test/test_quantization.py::TestFloat8DtypeCUDA::test_finfo_cuda_float8_e4m3fnuz, test/test_quantization.py::TestFloat8DtypeCUDA::test_float8_e8m0fnu_rne_rounding_cuda 2025-09-07T08:01:20.1226017Z 2025-09-07T08:04:35.0554742Z 2025-09-07T08:04:35.0556723Z test_dataloader 2/2 was successful, full logs can be found in artifacts with path test/test-reports/test_dataloader_2.2_254a125c4b3dfe5c_.log 2025-09-07T08:04:35.0587032Z Running 90 items in this shard: test/test_dataloader.py::TestDatasetRandomSplit::test_lengths_must_equal_dataset_size, test/test_dataloader.py::TestDatasetRandomSplit::test_slicing_of_subset_of_dataset, test/test_dataloader.py::TestDatasetRandomSplit::test_splits_generator, test/test_dataloader.py::TestDatasetRandomSplit::test_splits_have_correct_size, test/test_dataloader.py::TestDatasetRandomSplit::test_splits_reproducibility, test/test_dataloader.py::TestTensorDataset::test_getitem_1d, test/test_dataloader.py::TestTensorDataset::test_single_tensor, test/test_dataloader.py::TestStackDataset::test_empty, test/test_dataloader.py::TestStackDataset::test_len, test/test_dataloader.py::TestConcatDataset::test_add_dataset, test/test_dataloader.py::TestConcatDataset::test_concat_raises_index_error, test/test_dataloader.py::TestConcatDataset::test_concat_two_singletons, test/test_dataloader.py::TestDataLoader::test_batch_sampler, test/test_dataloader.py::TestDataLoader::test_bulk_loading_nobatch, test/test_dataloader.py::TestDataLoader::test_chain_iterable_style_dataset, test/test_dataloader.py::TestDataLoader::test_default_collate_bad_sequence_type, test/test_dataloader.py::TestDataLoader::test_default_collate_dtype, test/test_dataloader.py::TestDataLoader::test_default_collate_numpy_memmap, test/test_dataloader.py::TestDataLoader::test_default_convert_mapping_keep_type, test/test_dataloader.py::TestDataLoader::test_default_convert_sequence_keep_type, test/test_dataloader.py::TestDataLoader::test_distributed_sampler_invalid_rank, test/test_dataloader.py::TestDataLoader::test_error_workers, test/test_dataloader.py::TestDataLoader::test_growing_dataset, test/test_dataloader.py::TestDataLoader::test_invalid_ctor_args_combinations, test/test_dataloader.py::TestDataLoader::test_large_sampler_indices, test/test_dataloader.py::TestDataLoader::test_multiple_dataloaders, test/test_dataloader.py::TestDataLoader::test_multiprocessing_iterdatapipe, test/test_dataloader.py::TestDataLoader::test_numpy_gen_state, test/test_dataloader.py::TestDataLoader::test_numpy_scalars, test/test_dataloader.py::TestDataLoader::test_proper_exit, test/test_dataloader.py::TestDataLoader::test_segfault, test/test_dataloader.py::TestDataLoader::test_sequential_batch, test/test_dataloader.py::TestDataLoader::test_sequential_nonbatch, test/test_dataloader.py::TestDataLoader::test_sequential_pin_memory, test/test_dataloader.py::TestDataLoader::test_sequential_workers, test/test_dataloader.py::TestDataLoader::test_shuffle_batch_none, test/test_dataloader.py::TestDataLoader::test_shuffle_batch_workers, test/test_dataloader.py::TestDataLoader::test_shuffle_batch_workers_prefetch, test/test_dataloader.py::TestDataLoader::test_shuffle_pin_memory, test/test_dataloader.py::TestDataLoader::test_shuffle_reproducibility, test/test_dataloader.py::TestDataLoader::test_shuffle_workers, test/test_dataloader.py::TestDataLoader::test_timeout, test/test_dataloader.py::TestDataLoader::test_typing, test/test_dataloader.py::TestDataLoader::test_worker_seed, test/test_dataloader.py::TestDataLoader::test_worker_seed_reproducibility, test/test_dataloader.py::IntegrationTestDataLoaderDataPipe::test_shuffler_iterdatapipe, test/test_dataloader.py::TestStringDataLoader::test_shuffle_pin_memory, test/test_dataloader.py::TestDictDataLoader::test_pin_memory, test/test_dataloader.py::TestDictDataLoader::test_pin_memory_device, test/test_dataloader.py::TestDictDataLoader::test_sequential_batch, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_builtin_collection_conversion, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_bulk_loading_nobatch, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_chain_iterable_style_dataset, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_default_collate_bad_sequence_type, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_default_collate_shared_tensor, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_default_convert_mapping_keep_type, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_distributed_sampler_invalid_rank, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_early_exit, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_error_in_init, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_error_workers, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_excessive_thread_creation_warning, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_fd_limit_exceeded, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_growing_dataset, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_invalid_assign_after_init, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_len, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_multi_epochs_reproducibility, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_multiple_dataloaders, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_multiprocessing_contexts, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_multiprocessing_iterdatapipe_with_dill, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_no_segfault, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_numpy, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_numpy_gen_state, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_proper_exit, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_random_sampler_len_with_replacement, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_random_sampler_len_without_replacement, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_sampler, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_sampler_reproducibility, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_sequential_batch, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_sequential_nonbatch, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_sequential_pin_memory, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_shuffle_batch, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_shuffle_reproducibility, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_typing, test/test_dataloader.py::TestDataLoaderPersistentWorkers::test_worker_seed, test/test_dataloader.py::TestCustomPinFn::test_custom_batch_pin_worker, test/test_dataloader.py::TestIndividualWorkerQueue::test_ind_worker_queue, test/test_dataloader.py::TestSetAffinity::test_set_affinity_in_worker_init, test/test_dataloader.py::TestOutOfOrderDataLoader::test_in_order_iterable_ds, test/test_dataloader.py::TestDataLoaderDeviceTypeCUDA::test_nested_tensor_multiprocessing_context_forkserver_cuda, test/test_dataloader.py::TestDataLoaderDeviceTypeCUDA::test_sparse_tensor_multiprocessing_context_forkserver_cuda 2025-09-07T08:04:35.0611789Z 2025-09-07T08:08:02.3237507Z 2025-09-07T08:08:02.3238609Z test_quantization 2/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_2.13_8522d1db637ad137_.log 2025-09-07T08:08:02.3276677Z Running 103 items in this shard: test/test_quantization.py::TestQuantizedOps::test_adaptive_avg_pool3d_ndhwc, test/test_quantization.py::TestQuantizedOps::test_advanced_indexing, test/test_quantization.py::TestQuantizedConv::test_benchmark, test/test_quantization.py::TestQuantizedConv::test_qconv2d_sum_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv2d_unpack, test/test_quantization.py::TestQuantizedConv::test_qconv_transpose2d, test/test_quantization.py::TestFakeQuantizeOps::test_backward_per_channel_cachemask_cuda, test/test_quantization.py::TestFakeQuantizeOps::test_forward_per_channel_half_precision_numerics, test/test_quantization.py::TestFakeQuantizeOps::test_forward_per_tensor_cachemask_cpu, test/test_quantization.py::TestFakeQuantizeOps::test_forward_per_tensor_half_precision_numerics, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_forward_per_tensor_cpu, test/test_quantization.py::TestQuantizedTensor::test_choose_qparams_optimized, test/test_quantization.py::TestQuantizedTensor::test_qtensor_per_channel_load_save, test/test_quantization.py::TestQuantizedTensor::test_repeat, test/test_quantization.py::TestObserver::test_per_channel_observers, test/test_quantization.py::TestStaticQuantizedModule::test_conv1d_api, test/test_quantization.py::TestStaticQuantizedModule::test_linear_relu, test/test_quantization.py::TestDynamicQuantizedModule::test_lstm_api, test/test_quantization.py::TestHistogramObserver::test_histogram_observer_correct_numel, test/test_quantization.py::TestHistogramObserver::test_observer_scriptable, test/test_quantization.py::TestDistributed::test_device_affinity, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_convtranspose_per_channel_qconfig_none, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_normalization, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_per_channel_linear_quantize, test/test_quantization.py::TestQuantizeEagerOps::test_conv_transpose_3d, test/test_quantization.py::TestQuantizeEagerQAT::test_defused_embedding_bag_linear, test/test_quantization.py::TestQuantizeEagerQAT::test_embedding_bag_linear, test/test_quantization.py::TestFuseEager::test_fusion_sequential_model_train, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_outputs_lstm_dynamic, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_stub_linear_static, test/test_quantization.py::TestBiasCorrectionEager::test_linear_chain, test/test_quantization.py::TestQuantizeFx::test_conv_linear_not_reference, test/test_quantization.py::TestQuantizeFx::test_custom_module_class_input_has_duplicate_nodes, test/test_quantization.py::TestQuantizeFx::test_dict_output, test/test_quantization.py::TestQuantizeFx::test_dynamic_with_fusion, test/test_quantization.py::TestQuantizeFx::test_linear_shape_view, test/test_quantization.py::TestQuantizeFx::test_linear_size_view, test/test_quantization.py::TestQuantizeFx::test_prepare_custom_config_to_dict, test/test_quantization.py::TestQuantizeFx::test_propagate_dtypes_for_known_nodes_list_args, test/test_quantization.py::TestQuantizeFx::test_propagate_dtypes_for_known_nodes_tuple_args, test/test_quantization.py::TestQuantizeFx::test_qconfig_dict_setup, test/test_quantization.py::TestQuantizeFx::test_repeat_nontensor_args_not_observed, test/test_quantization.py::TestQuantizeFx::test_reshape_nontensor_args_not_observed, test/test_quantization.py::TestQuantizeFx::test_reuse_input_qconfig, test/test_quantization.py::TestQuantizeFxOps::test_elu, test/test_quantization.py::TestQuantizeFxOps::test_gelu_reference, test/test_quantization.py::TestQuantizeFxOps::test_hardswish, test/test_quantization.py::TestQuantizeFxOps::test_pixel_unshuffle, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_is_entire_graph, test/test_quantization.py::TestDuplicateDQPass::test_simple_duplicate_dq, test/test_quantization.py::TestNumericDebugger::test_simple, test/test_quantization.py::TestQuantizePT2E::test_quantization_dtype_float32_int16, test/test_quantization.py::TestQuantizePT2E::test_save_load, test/test_quantization.py::TestQuantizePT2EAffineQuantization::test_dynamic_affine_act_per_channel_weights, test/test_quantization.py::TestXNNPACKQuantizer::test_conv_linear_no_permute, test/test_quantization.py::TestXNNPACKQuantizer::test_linear, test/test_quantization.py::TestXNNPACKQuantizer::test_qat_dynamic_linear, test/test_quantization.py::TestXNNPACKQuantizerModels::test_resnet18, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_filter_linear_recipe, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_binary_dynamic, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_qat_conv2d_unary, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_set_module_name_qconfig_with_underscores, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_transpose_bn, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_inplace_add_relu, test/test_quantization.py::TestFXGraphMatcher::test_methods, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_add_shadow_loggers_cuda, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_cuda, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_linear_fun_qat, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_int8_shadows_fp32_coverage, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_match_activations_mod_ptq, test/test_quantization.py::TestFXNumericSuiteNShadows::test_qconfig_multi_mapping_ordering, test/test_quantization.py::TestFXNumericSuiteCoreAPIsModels::test_compare_weights_conv, test/test_quantization.py::TestFxModelReportDetector::test_simple_conv, test/test_quantization.py::TestFxModelReportObserver::test_zero_tensor_errors, test/test_quantization.py::TestFxModelReportClass::test_constructor, test/test_quantization.py::TestFxDetectInputWeightEqualization::test_input_weight_equalization_report_gen, test/test_quantization.py::TestEqualizeFx::test_input_weight_equalization_branching, test/test_quantization.py::TestEqualizeFx::test_input_weight_equalization_graphs, test/test_quantization.py::TestSerialization::test_conv2d_graph_v2, test/test_quantization.py::TestSerialization::test_conv2d_graph_v3, test/test_quantization.py::TestSerialization::test_conv2d_nobias, test/test_quantization.py::TestQuantizeJit::test_single_linear, test/test_quantization.py::TestQuantizeJit::test_skip_quant, test/test_quantization.py::TestQuantizeJitPasses::test_insert_observers_propagate_observed_for_function, test/test_quantization.py::TestQuantizeJitPasses::test_replicate_dequant_same_value, test/test_quantization.py::TestQuantizeJitOps::test_general_shape_ops, test/test_quantization.py::TestQuantizeJitOps::test_group_norm, test/test_quantization.py::TestQuantizeJitOps::test_instance_norm, test/test_quantization.py::TestQuantizeJitOps::test_layer_norm, test/test_quantization.py::TestQuantizeJitOps::test_quantized_add_alpha, test/test_quantization.py::TestQuantizeJitOps::test_quantized_add_scalar_relu, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_quantize_dynamic_fp16, test/test_quantization.py::TestAOMigrationNNQuantized::test_import_nn_quantizable_rnn, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_conv, test/test_quantization.py::TestAOMigrationNNIntrinsic::test_modules_import_nn_intrinsic_quantized, test/test_quantization.py::TestAOMigrationQuantizationFx::test_function_import_fx_match_utils, test/test_quantization.py::TestAOMigrationQuantizationFx::test_function_import_quantize_fx, test/test_quantization.py::TestFloat8DtypeCUDA::test_cast_round_trip_soak_cuda_float8_e8m0fnu, test/test_quantization.py::TestFloat8DtypeCUDA::test_empty_cuda_float8_e5m2, test/test_quantization.py::TestFloat8DtypeCUDA::test_empty_cuda_float8_e8m0fnu, test/test_quantization.py::TestFloat8DtypeCUDA::test_save_load_cuda_float8_e4m3fn, test/test_quantization.py::TestFloat8DtypeCUDA::test_special_numbers_cuda_float8_e8m0fnu, test/test_quantization.py::TestFloat8DtypeCUDA::test_to_string_cuda_float8_e8m0fnu 2025-09-07T08:08:02.3303721Z 2025-09-07T08:08:02.3303951Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T08:08:02.3304375Z Uploading artifacts took 0.00 seconds 2025-09-07T08:11:00.2609183Z 2025-09-07T08:11:00.2610529Z test_quantization 1/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_1.13_b42c1b45a69dcdeb_.log 2025-09-07T08:11:00.2644196Z Running 101 items in this shard: test/test_quantization.py::TestQuantizedOps::test_avg_pool2d_nhwc, test/test_quantization.py::TestQuantizedOps::test_avg_pool3d, test/test_quantization.py::TestQuantizedOps::test_group_norm, test/test_quantization.py::TestQuantizedOps::test_qmul_broadcast, test/test_quantization.py::TestQuantizedOps::test_qmul_relu_different_qparams, test/test_quantization.py::TestQuantizedOps::test_sigmoid_non_observed, test/test_quantization.py::TestQNNPackOps::test_hardtanh, test/test_quantization.py::TestQNNPackOps::test_qnnpack_maxpool2d, test/test_quantization.py::TestQuantizedLinear::test_qlinear_with_input_q_dq_qweight_dq_output_fp32, test/test_quantization.py::TestQuantizedLinear::test_wrapped_quantized_linear_prepacked, test/test_quantization.py::TestQuantizedConv::test_qconv2d_hardswish_fp8, test/test_quantization.py::TestQuantizedConv::test_qconv2d_relu_cudnn, test/test_quantization.py::TestDynamicQuantizedOps::test_dynamic_convtranspose2d, test/test_quantization.py::TestDynamicQuantizedOps::test_dynamic_convtranspose3d, test/test_quantization.py::TestQuantizedFunctionalOps::test_grid_sample, test/test_quantization.py::TestQuantizedTensor::test_decomposed_dequantize_per_tensor, test/test_quantization.py::TestQuantizedTensor::test_per_channel_qtensor_creation_cpu, test/test_quantization.py::TestQuantizedTensor::test_qtensor_reshape, test/test_quantization.py::TestQuantizedTensor::test_quantize_per_channel_float_qparams, test/test_quantization.py::TestObserver::test_histogram_observer_save_load_state_dict, test/test_quantization.py::TestStaticQuantizedModule::test_batch_norm2d_serialization, test/test_quantization.py::TestStaticQuantizedModule::test_elu, test/test_quantization.py::TestStaticQuantizedModule::test_instance_norm, test/test_quantization.py::TestDynamicQuantizedModule::test_dynamic_conv1d, test/test_quantization.py::TestDynamicQuantizedModule::test_dynamic_convtranspose3d, test/test_quantization.py::TestFusedObsFakeQuantModule::test_compare_fused_obs_fq_oss_module, test/test_quantization.py::TestFusedObsFakeQuantModule::test_fused_obs_fq_moving_avg_module, test/test_quantization.py::TestBackendConfig::test_dtype_config_from_dict, test/test_quantization.py::TestBackendConfig::test_dtype_config_to_dict, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_activations, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_manual, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_forward_hooks_preserved, test/test_quantization.py::TestQuantizeEagerQAT::test_qat_embedding_bag_errors, test/test_quantization.py::TestQuantizeEagerQATNumerics::test_linear_bn_symm_numerics, test/test_quantization.py::TestBiasCorrectionEager::test_conv_chain, test/test_quantization.py::TestFuseFx::test_fuse_conv_bn_add_relu_lowering, test/test_quantization.py::TestQuantizeFx::test_fuse_custom_config_from_dict, test/test_quantization.py::TestQuantizeFx::test_fusion_pattern_unquantized, test/test_quantization.py::TestQuantizeFx::test_linear_bn, test/test_quantization.py::TestQuantizeFx::test_linear_leaky_relu_lowering, test/test_quantization.py::TestQuantizeFx::test_lowering_functional_conv_with_kwargs, test/test_quantization.py::TestQuantizeFx::test_masked_fill_nontensor_args_not_observed, test/test_quantization.py::TestQuantizeFx::test_no_obs_between_unmatched_node_and_copy_node, test/test_quantization.py::TestQuantizeFx::test_observer_fqn, test/test_quantization.py::TestQuantizeFx::test_prepare_custom_config_set_preserved_attributes, test/test_quantization.py::TestQuantizeFx::test_propagate_dtypes_for_known_nodes_dict_args, test/test_quantization.py::TestQuantizeFx::test_qconfig_mapping_from_dict, test/test_quantization.py::TestQuantizeFx::test_qconfig_mapping_set_module_name_regex, test/test_quantization.py::TestQuantizeFx::test_qconfig_mapping_to_dict, test/test_quantization.py::TestQuantizeFx::test_remove_qconfig, test/test_quantization.py::TestQuantizeFx::test_stack_trace_preserved_linear, test/test_quantization.py::TestQuantizeFx::test_standalone_module_quantized_interface, test/test_quantization.py::TestQuantizeFx::test_static_lstm_with_custom_fixed_qparams, test/test_quantization.py::TestQuantizeFxOps::test_bmm, test/test_quantization.py::TestQuantizeFxOps::test_embedding_bag, test/test_quantization.py::TestQuantizeFxOps::test_qmatmul, test/test_quantization.py::TestQuantizeFxOps::test_ref_pattern_multi_use, test/test_quantization.py::TestQuantizeFxOps::test_sub, test/test_quantization.py::TestNumericDebugger::test_control_flow, test/test_quantization.py::TestQuantizePT2E::test_composable_quantizer_transform_for_annotation, test/test_quantization.py::TestQuantizePT2E::test_conv_padding_bn_relu, test/test_quantization.py::TestQuantizePT2E::test_observer_callback, test/test_quantization.py::TestQuantizePT2E::test_quantization_dtype_bfloat16_float8_e5m2, test/test_quantization.py::TestQuantizePT2E::test_shared_qspec_transitivity, test/test_quantization.py::TestPT2ERepresentation::test_qdq, test/test_quantization.py::TestXNNPACKQuantizer::test_conv1d_with_conv2d, test/test_quantization.py::TestXNNPACKQuantizer::test_set_module_name_with_underscores, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_conv2d_binary, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_qat_dynamic_quant_linear, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_conv_bn_bias_derived_qspec, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_bn_fusion_no_conv_bias, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_bn_relu_fusion_no_conv_bias, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_no_bias, test/test_quantization.py::TestFXGraphMatcher::test_dict_return_type, test/test_quantization.py::TestFXGraphMatcher::test_matching_failure_node_count, test/test_quantization.py::TestFXGraphMatcherModels::test_mobilenet_v2, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_add_shadow_loggers_fun_qat, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_match_activations_fqn, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_op_with_either_fp32_or_int8_input, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_op_with_only_kwargs_skips_shadowing, test/test_quantization.py::TestFXNumericSuiteNShadows::test_add_loggers_linear_mod_fp32_fp32, test/test_quantization.py::TestFXNumericSuiteNShadows::test_add_loggers_mobilenet_v2, test/test_quantization.py::TestFXNumericSuiteNShadows::test_mobilenet_v2, test/test_quantization.py::TestFXNumericSuiteCoreAPIsModels::test_compare_shadow_activations_lstm_dynamic, test/test_quantization.py::TestFXNumericSuiteCoreAPIsModels::test_mobilenet_v2, test/test_quantization.py::TestFxModelReportDetector::test_fusion_layer_in_sequential, test/test_quantization.py::TestFxModelReportObserver::test_single_batch_of_ones, test/test_quantization.py::TestFxModelReportVisualizer::test_get_modules_and_features, test/test_quantization.py::TestEqualizeFx::test_input_weight_equalization_results, test/test_quantization.py::TestSerialization::test_conv2d_nobias_graph_v2, test/test_quantization.py::TestQuantizeJitPasses::test_insert_observers_for_general_ops, test/test_quantization.py::TestQuantizeJitPasses::test_insert_observers_for_if, test/test_quantization.py::TestQuantizeJitOps::test_quantized_add_relu_alpha, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_convert_dynamic_fp16, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_insert_quant_dequant_linear_dynamic, test/test_quantization.py::TestDeprecatedJitQuantized::test_rnn_cell_quantized, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_quantize, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_quantize_jit, test/test_quantization.py::TestAOMigrationNNQuantized::test_import_nn_qat_embedding_ops, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_import, test/test_quantization.py::TestFloat8DtypeCUDA::test_to_string_cuda_float8_e4m3fn 2025-09-07T08:11:00.2670655Z 2025-09-07T08:12:26.1460181Z 2025-09-07T08:12:26.1461055Z test_quantization 9/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_9.13_621650def0a6a9f2_.log 2025-09-07T08:12:26.1488262Z Running 92 items in this shard: test/test_quantization.py::TestQuantizedOps::test_avg_pool3d_nhwc, test/test_quantization.py::TestQuantizedOps::test_instance_norm, test/test_quantization.py::TestQuantizedOps::test_mul_scalar_relu, test/test_quantization.py::TestQuantizedOps::test_qsoftmax_qnnpack, test/test_quantization.py::TestQuantizedOps::test_sigmoid, test/test_quantization.py::TestQNNPackOps::test_qnnpack_add_broadcast, test/test_quantization.py::TestQuantizedLinear::test_qlinear_relu_pt2e, test/test_quantization.py::TestQuantizedConv::test_conv_reorder_issue_onednn, test/test_quantization.py::TestQuantizedConv::test_qconv1d_fp8, test/test_quantization.py::TestQuantizedConv::test_qconv1d_relu, test/test_quantization.py::TestQuantizedConv::test_qconv1d_relu_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv2d_hardtanh_fp8, test/test_quantization.py::TestQuantizedConv::test_qconv_transpose1d, test/test_quantization.py::TestDynamicQuantizedOps::test_qrnncell, test/test_quantization.py::TestQuantizedEmbeddingOps::test_embedding_bag_2bit, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_backward_per_tensor_cpu, test/test_quantization.py::TestFusedObsFakeQuant::test_fused_obs_fake_quant_moving_avg_per_channel, test/test_quantization.py::TestQuantizedTensor::test_bfp16_quantize, test/test_quantization.py::TestQuantizedTensor::test_decomposed_dequantize_per_channel, test/test_quantization.py::TestQuantizedTensor::test_decomposed_quantize_per_channel_bfloat16_input, test/test_quantization.py::TestQuantizedTensor::test_jit_serialization, test/test_quantization.py::TestStaticQuantizedModule::test_channel_shuffle, test/test_quantization.py::TestStaticQuantizedModule::test_embedding_bag_api, test/test_quantization.py::TestReferenceQuantizedModule::test_rnn_cell, test/test_quantization.py::TestHistogramObserver::test_histogram_observer_against_reference, test/test_quantization.py::TestHistogramObserver::test_histogram_observer_one_sided, test/test_quantization.py::TestHistogramObserver::test_histogram_observer_same_inputs, test/test_quantization.py::TestBackendConfig::test_backend_config_set_backend_pattern_config, test/test_quantization.py::TestBackendConfig::test_backend_config_set_name, test/test_quantization.py::TestBackendConfig::test_backend_op_config_set_input_type_to_index, test/test_quantization.py::TestBackendConfig::test_backend_op_config_set_root_node_getter, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_save_load_state_dict, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_embedding_bag_dynamic, test/test_quantization.py::TestQuantizeEagerOps::test_relu, test/test_quantization.py::TestQuantizeEagerQAT::test_dropout, test/test_quantization.py::TestQuantizeEagerQAT::test_forward_hooks_preserved, test/test_quantization.py::TestFuseEager::test_fuse_module_eval, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_stub_linear_dynamic, test/test_quantization.py::TestNumericSuiteEager::test_output_logger, test/test_quantization.py::TestEqualizeEager::test_converged, test/test_quantization.py::TestEqualizeEager::test_equalize, test/test_quantization.py::TestEqualizeEager::test_equalize_fused_convrelu, test/test_quantization.py::TestFuseFx::test_fusion_pattern_with_matchallnode, test/test_quantization.py::TestQuantizeFx::test_get_default_qconfig_valid_backend, test/test_quantization.py::TestQuantizeFx::test_lowering_functional_conv_transpose_with_kwargs, test/test_quantization.py::TestQuantizeFx::test_prepare_custom_config_set_input_quantized_indexes, test/test_quantization.py::TestQuantizeFx::test_qconfig_function, test/test_quantization.py::TestQuantizeFx::test_relu_lowering, test/test_quantization.py::TestQuantizeFxOps::test_gelu_normal, test/test_quantization.py::TestQuantizeFxOps::test_general_shape_ops, test/test_quantization.py::TestQuantizeFxOps::test_mul, test/test_quantization.py::TestQuantizeFxOps::test_rnn, test/test_quantization.py::TestQuantizeFxModels::test_model_dropout, test/test_quantization.py::TestQuantizeFxModels::test_static_gpu_convert_basic, test/test_quantization.py::TestDuplicateDQPass::test_avgpool_use_different_qconfig, test/test_quantization.py::TestQuantizePT2E::test_embedding_conv_linear_quantization, test/test_quantization.py::TestQuantizePT2E::test_move_exported_model_bn_device_cpu, test/test_quantization.py::TestXNNPACKQuantizer::test_add_and_inplace_add, test/test_quantization.py::TestXNNPACKQuantizer::test_cat_same_node, test/test_quantization.py::TestXNNPACKQuantizer::test_dynamic_linear_int4_weight, test/test_quantization.py::TestXNNPACKQuantizer::test_linear_relu, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_cat_recipe, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_binary_unary_serials, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_lowering_to_x86, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_set_module_name_with_mixed_configs, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_conv_bn_relu_fusion, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_bn_bias_derived_qspec, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_bn_fusion, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_add_shadow_loggers_fun_ptq, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_dynamic, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_int8_shadows_int8_mod, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_linear_kwargs_shadow, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_loggers_preserve_qat_numerics, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_logging_inputs, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_match_activations_fun_qat, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_user_module, test/test_quantization.py::TestFXNumericSuiteCoreAPIsModels::test_compare_shadow_activations_conv, test/test_quantization.py::TestFXNumericSuiteCoreAPIsModels::test_resnet18, test/test_quantization.py::TestFXNumericSuiteCoreAPIsModels::test_sparsenn_compare_activations, test/test_quantization.py::TestQuantizeJit::test_conv_transpose, test/test_quantization.py::TestQuantizeJit::test_nested, test/test_quantization.py::TestQuantizeJitPasses::test_finalize_debug, test/test_quantization.py::TestQuantizeJitPasses::test_insert_observers_for_nested_if, test/test_quantization.py::TestQuantizeJitOps::test_qbatch_norm, test/test_quantization.py::TestQuantizeJitOps::test_quantized_conv_relu, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_dynamic_with_if, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_linear, test/test_quantization.py::TestAOMigrationNNIntrinsic::test_modules_intrinsic_quantized_conv_relu, test/test_quantization.py::TestAOMigrationQuantizationFx::test_function_import_fx_prepare, test/test_quantization.py::TestFloat8DtypeCUDA::test_cat_cuda_float8_e4m3fn, test/test_quantization.py::TestFloat8DtypeCUDA::test_cat_cuda_float8_e8m0fnu, test/test_quantization.py::TestFloat8DtypeCUDA::test_creation_with_zeros_cuda_float8_e5m2 2025-09-07T08:12:26.1511884Z 2025-09-07T08:12:31.6177785Z 2025-09-07T08:12:31.6179246Z test_quantization 6/13 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_6.13_c7f697ac402c84b7_.log 2025-09-07T08:12:31.6204623Z Running 88 items in this shard: test/test_quantization.py::TestQuantizedOps::test_cat_nhwc, test/test_quantization.py::TestQuantizedOps::test_empty_batch, test/test_quantization.py::TestQuantizedOps::test_int8_add_onednn, test/test_quantization.py::TestQuantizedOps::test_interpolate3d, test/test_quantization.py::TestQuantizedOps::test_max_pool2d_nhwc, test/test_quantization.py::TestQuantizedOps::test_qadd_relu_cudnn, test/test_quantization.py::TestQuantizedOps::test_std, test/test_quantization.py::TestQuantizedLinear::test_qlinear_sum_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv1d_relu_cudnn, test/test_quantization.py::TestQuantizedConv::test_qconv1d_relu_fp8, test/test_quantization.py::TestQuantizedConv::test_qconv2d_swish_fp8, test/test_quantization.py::TestDynamicQuantizedOps::test_dynamic_conv1d, test/test_quantization.py::TestDynamicQuantizedOps::test_linear_dynamic_fp16_onednn, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_forward_per_channel_cpu, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_forward_per_tensor_cuda, test/test_quantization.py::TestQuantizedTensor::test_decomposed_quantize_per_token, test/test_quantization.py::TestQuantizedTensor::test_qscheme_pickle, test/test_quantization.py::TestQuantizedTensor::test_qtensor_channel_float_assignment, test/test_quantization.py::TestQuantizedTensor::test_qtensor_index_put_cuda, test/test_quantization.py::TestQuantizedTensor::test_qtensor_legacy_new_failure, test/test_quantization.py::TestQuantizedTensor::test_qtensor_sub_byte_not_aligned_cols, test/test_quantization.py::TestQuantizedTensor::test_qtensor_unsqueeze, test/test_quantization.py::TestFakeQuantize::test_quant_min_max_override, test/test_quantization.py::TestObserver::test_observer_qparams_respects_device_affinity, test/test_quantization.py::TestObserver::test_state_dict_respects_device_affinity, test/test_quantization.py::TestStaticQuantizedModule::test_conv1d_relu_api, test/test_quantization.py::TestStaticQuantizedModule::test_conv2d_add, test/test_quantization.py::TestStaticQuantizedModule::test_prelu, test/test_quantization.py::TestDistributed::test_observers_preserve_buffers, test/test_quantization.py::TestBackendConfig::test_backend_op_config_set_root_module, test/test_quantization.py::TestUtils::test_get_fqn_to_example_inputs_complex_args, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_mha_batch_first_attr_is_copied_in_prepare, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_nested2, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_two_layers, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_quantized_rnn, test/test_quantization.py::TestQuantizeEagerOps::test_leaky_relu, test/test_quantization.py::TestModelNumericsEager::test_weight_only_activation_only_fakequant, test/test_quantization.py::TestFuseFx::test_fuse_conv_bn_relu, test/test_quantization.py::TestFuseFx::test_fuse_linear_bn_leaky_relu_onednn, test/test_quantization.py::TestFuseFx::test_fuse_linear_tanh_for_onednn_backend, test/test_quantization.py::TestQuantizeFx::test__convert_to_reference_decomposed_fx_per_channel_quant, test/test_quantization.py::TestQuantizeFx::test_attention, test/test_quantization.py::TestQuantizeFx::test_backend_config_scale_min, test/test_quantization.py::TestQuantizeFx::test_conv_bn_relu, test/test_quantization.py::TestQuantizeFx::test_convert_custom_config_set_preserved_attributes, test/test_quantization.py::TestQuantizeFx::test_custom_module_class_input_has_multiple_users, test/test_quantization.py::TestQuantizeFx::test_match_pattern_with_multiple_args, test/test_quantization.py::TestQuantizeFx::test_packed_weight_fused_op, test/test_quantization.py::TestQuantizeFx::test_prepare_custom_config_from_dict, test/test_quantization.py::TestQuantizeFx::test_preserve_tuple, test/test_quantization.py::TestQuantizeFx::test_propagate_dtypes_for_known_nodes_dict_tuple_args, test/test_quantization.py::TestQuantizeFx::test_qparams_fqn, test/test_quantization.py::TestQuantizeFx::test_ref_conv_module, test/test_quantization.py::TestQuantizeFx::test_sub_scalar, test/test_quantization.py::TestQuantizeFx::test_transpose_nontensor_args_not_observed, test/test_quantization.py::TestQuantizeFxOps::test_prelu, test/test_quantization.py::TestGraphUtils::test_conv_bn_conv_relu, test/test_quantization.py::TestDuplicateDQPass::test_no_need_for_duplicate_dq, test/test_quantization.py::TestNumericDebugger::test_quantize_pt2e_preserve_handle, test/test_quantization.py::TestQuantizePT2E::test_reentrant, test/test_quantization.py::TestPT2ERepresentation::test_add_relu, test/test_quantization.py::TestXNNPACKQuantizer::test_gru, test/test_quantization.py::TestXNNPACKQuantizer::test_set_module_name, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_conv2d_binary2, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_conv2d_binary_unary, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_binary_qat, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_binary_unary, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_binary_unary_dynamic_qat, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_maxpool2d_recipe, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_conv_bn_fusion_cuda, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_transpose_bn_relu, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_update_shared_qspec, test/test_quantization.py::TestFXGraphMatcher::test_op_relationship_mapping, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_match_activations_meth_ptq, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_unsupported_op_copy_skips_shadowing, test/test_quantization.py::TestFXNumericSuiteNShadows::test_qconfig_multi_mapping_insert_padding, test/test_quantization.py::TestFxModelReportObserver::test_random_epochs_and_batches, test/test_quantization.py::TestFxModelReportClass::test_qconfig_mapping_generation, test/test_quantization.py::TestQuantizeJitPasses::test_foldbn_complex_cases, test/test_quantization.py::TestQuantizeJitPasses::test_fuse_linear, test/test_quantization.py::TestQuantizeJitPasses::test_insert_observers_skip_values, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_dynamic_quant_multi_uses, test/test_quantization.py::TestFusionPasses::test_quantized_add_relu_fusion, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_observer, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_quant_type, test/test_quantization.py::TestFloat8DtypeCUDA::test_cast_round_trip_extremes_cuda_float8_e4m3fn, test/test_quantization.py::TestFloat8DtypeCUDA::test_cast_round_trip_rte_cuda_float8_e4m3fn, test/test_quantization.py::TestFloat8DtypeCUDA::test_creation_with_zeros_cuda_float8_e4m3fn 2025-09-07T08:12:31.6227091Z 2025-09-07T08:16:20.7702619Z 2025-09-07T08:16:20.7703546Z test_decomp 3/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_3.22_4b160206bd627426_.log 2025-09-07T08:16:20.7810916Z Running 388 items in this shard: test/test_decomp.py::TestDecompCUDA::test_cat_single_input_cuda, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rand___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bernoulli_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_shapes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_inner_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_singular_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vector_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_normalize_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_glu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rms_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_nuc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_uniform_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_addcdiv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_clamp_max_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_renorm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_special_entr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_elu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_huber_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_logsigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mse_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_neg_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_complex64 2025-09-07T08:16:20.7904913Z 2025-09-07T09:12:32.6877803Z 2025-09-07T09:12:32.6878720Z PRINTING LOG FILE of test_decomp 15/22 (test/test-reports/test_decomp_15.22_91568ce4372ebc5f_.log) 2025-09-07T09:12:32.6881216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:12:32.6883302Z import pkg_resources 2025-09-07T09:12:32.6884074Z Test results will be stored in test-reports/python-pytest/test_decomp/test_decomp-a42bd662c33acfa8.xml 2025-09-07T09:12:32.6884968Z ============================= test session starts ============================== 2025-09-07T09:12:32.6885794Z platform linux -- Python 3.10.18, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-09-07T09:12:32.6886233Z cachedir: .pytest_cache 2025-09-07T09:12:32.6886742Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T09:12:32.6887292Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T09:12:32.6887564Z configfile: pytest.ini 2025-09-07T09:12:32.6888080Z plugins: cpp-2.3.0, hypothesis-5.35.1, flakefinder-1.1.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.1.0, rerunfailures-14.0, typeguard-4.3.0 2025-09-07T09:12:32.6888626Z collecting ... collected 9001 items 2025-09-07T09:12:32.6888932Z stepcurrent: Cannot find last run test, not skipping 2025-09-07T09:12:32.6996722Z Running 431 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_inverse_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frac_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_det_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vector_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_unpack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_multinomial_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_selu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_hann_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_nuttall_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtri_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cauchy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward__unsafe_masked_index_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_hardshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_max_unpool3d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_hypot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_embedding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_leaky_relu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softplus_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_neg_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_uniform_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_GRU_eval_mode_cuda_float32, test/test_decomp.py::DecompOneOffTestsCUDA::test_sdpa_nn_functional_scaled_dot_product_attention_cuda_float16 2025-09-07T09:12:32.7104585Z 2025-09-07T09:12:32.7105002Z test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_int8 SKIPPED [0.0008s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 0%] 2025-09-07T09:12:32.7105897Z test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 0%] 2025-09-07T09:12:32.7106803Z test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int16 SKIPPED [0.0012s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 0%] 2025-09-07T09:12:32.7107719Z test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 0%] 2025-09-07T09:12:32.7108631Z test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 1%] 2025-09-07T09:12:32.7109600Z test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 1%] 2025-09-07T09:12:32.7110630Z test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 1%] 2025-09-07T09:12:32.7111594Z test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 1%] 2025-09-07T09:12:32.7112483Z test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 2%] 2025-09-07T09:12:32.7113396Z test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 2%] 2025-09-07T09:12:32.7114296Z test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 2%] 2025-09-07T09:12:32.7115175Z test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 2%] 2025-09-07T09:12:32.7116050Z test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 3%] 2025-09-07T09:12:32.7116937Z test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 3%] 2025-09-07T09:12:32.7117836Z test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 3%] 2025-09-07T09:12:32.7118761Z test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 3%] 2025-09-07T09:12:32.7119698Z test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int8 SKIPPED [0.0002s] (Expected: new_empty_strided is not comparable) [ 3%] 2025-09-07T09:12:32.7120607Z test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 4%] 2025-09-07T09:12:32.7121618Z test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 4%] 2025-09-07T09:12:32.7122586Z test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 4%] 2025-09-07T09:12:32.7123497Z test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 4%] 2025-09-07T09:12:32.7124488Z test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 5%] 2025-09-07T09:12:32.7125505Z test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 5%] 2025-09-07T09:12:32.7126421Z test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 5%] 2025-09-07T09:12:32.7127339Z test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_bool SKIPPED [0.0007s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 5%] 2025-09-07T09:12:32.7128258Z test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 6%] 2025-09-07T09:12:32.7129193Z test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 6%] 2025-09-07T09:12:32.7130136Z test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 6%] 2025-09-07T09:12:32.7131043Z test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 6%] 2025-09-07T09:12:32.7131944Z test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 6%] 2025-09-07T09:12:32.7132871Z test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 7%] 2025-09-07T09:12:32.7133815Z test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_inverse_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 7%] 2025-09-07T09:12:32.7134820Z test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 7%] 2025-09-07T09:12:32.7135763Z test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 7%] 2025-09-07T09:12:32.7136690Z test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 8%] 2025-09-07T09:12:32.7137601Z test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 8%] 2025-09-07T09:12:32.7138503Z test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 8%] 2025-09-07T09:12:32.7139411Z test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 8%] 2025-09-07T09:12:32.7140323Z test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 9%] 2025-09-07T09:12:32.7141230Z test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 9%] 2025-09-07T09:12:32.7142133Z test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 9%] 2025-09-07T09:12:32.7143213Z test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 9%] 2025-09-07T09:12:32.7144126Z test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 9%] 2025-09-07T09:12:32.7145148Z test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 10%] 2025-09-07T09:12:32.7146151Z test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 10%] 2025-09-07T09:12:32.7147058Z test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 10%] 2025-09-07T09:12:32.7148004Z test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 10%] 2025-09-07T09:12:32.7148953Z test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 11%] 2025-09-07T09:12:32.7149863Z test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 11%] 2025-09-07T09:12:32.7150800Z test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 11%] 2025-09-07T09:12:32.7151716Z test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 11%] 2025-09-07T09:12:32.7152641Z test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 12%] 2025-09-07T09:12:32.7153592Z test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 12%] 2025-09-07T09:12:32.7154495Z test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 12%] 2025-09-07T09:12:32.7155386Z test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 12%] 2025-09-07T09:12:32.7156293Z test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 12%] 2025-09-07T09:12:32.7157198Z test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 13%] 2025-09-07T09:12:32.7158106Z test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 13%] 2025-09-07T09:12:32.7159043Z test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 13%] 2025-09-07T09:12:32.7159995Z test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 13%] 2025-09-07T09:12:32.7160958Z test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 14%] 2025-09-07T09:12:32.7161917Z test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 14%] 2025-09-07T09:12:32.7162833Z test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 14%] 2025-09-07T09:12:32.7163868Z test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 14%] 2025-09-07T09:12:32.7164778Z test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 15%] 2025-09-07T09:12:32.7165771Z test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 15%] 2025-09-07T09:12:32.7166745Z test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 15%] 2025-09-07T09:12:32.7167643Z test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 15%] 2025-09-07T09:12:32.7168541Z test_decomp.py::TestDecompCUDA::test_comprehensive_frac_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 16%] 2025-09-07T09:12:32.7169440Z test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 16%] 2025-09-07T09:12:32.7170352Z test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 16%] 2025-09-07T09:12:32.7171241Z test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 16%] 2025-09-07T09:12:32.7172138Z test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 16%] 2025-09-07T09:12:32.7173051Z test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 17%] 2025-09-07T09:12:32.7173899Z test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_float32 PASSED [2118.8847s] [ 17%] 2025-09-07T09:12:32.7174672Z test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_bool SKIPPED [0.0007s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 17%] 2025-09-07T09:12:32.7175583Z test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 17%] 2025-09-07T09:12:32.7176523Z test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 18%] 2025-09-07T09:12:32.7177455Z test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 18%] 2025-09-07T09:12:32.7178379Z test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 18%] 2025-09-07T09:12:32.7179313Z test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 18%] 2025-09-07T09:12:32.7180228Z test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 19%] 2025-09-07T09:12:32.7181181Z test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 19%] 2025-09-07T09:12:32.7182157Z test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 19%] 2025-09-07T09:12:32.7183106Z test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 19%] 2025-09-07T09:12:32.7184199Z test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 19%] 2025-09-07T09:12:32.7185120Z test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 20%] 2025-09-07T09:12:32.7186051Z test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 20%] 2025-09-07T09:12:32.7187051Z test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 20%] 2025-09-07T09:12:32.7188045Z test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 20%] 2025-09-07T09:12:32.7188959Z test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 21%] 2025-09-07T09:12:32.7189872Z test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 21%] 2025-09-07T09:12:32.7190771Z test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 21%] 2025-09-07T09:12:32.7191758Z test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 21%] 2025-09-07T09:12:32.7192835Z test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 22%] 2025-09-07T09:12:32.7193909Z test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 22%] 2025-09-07T09:12:32.7194976Z test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 22%] 2025-09-07T09:12:32.7195996Z test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 22%] 2025-09-07T09:12:32.7196944Z test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 22%] 2025-09-07T09:12:32.7197845Z test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 23%] 2025-09-07T09:12:32.7198737Z test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 23%] 2025-09-07T09:12:32.7199643Z test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 23%] 2025-09-07T09:12:32.7200574Z test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 23%] 2025-09-07T09:12:32.7201493Z test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 24%] 2025-09-07T09:12:32.7202433Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 24%] 2025-09-07T09:12:32.7203410Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_det_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 24%] 2025-09-07T09:12:32.7204361Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 24%] 2025-09-07T09:12:32.7205466Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 25%] 2025-09-07T09:12:32.7206460Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 25%] 2025-09-07T09:12:32.7207497Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 25%] 2025-09-07T09:12:32.7208492Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 25%] 2025-09-07T09:12:32.7209488Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 25%] 2025-09-07T09:12:32.7210517Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 26%] 2025-09-07T09:12:32.7211471Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 26%] 2025-09-07T09:12:32.7212404Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 26%] 2025-09-07T09:12:32.7213190Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_complex128 PASSED [73.5619s] [ 26%] 2025-09-07T09:12:32.7214111Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_complex128 SKIPPED [0.0007s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 27%] 2025-09-07T09:12:32.7215104Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 27%] 2025-09-07T09:12:32.7216056Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 27%] 2025-09-07T09:12:32.7217023Z test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vector_norm_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 27%] 2025-09-07T09:12:32.7217986Z test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 28%] 2025-09-07T09:12:32.7218900Z test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 28%] 2025-09-07T09:12:32.7219789Z test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_uint8 SKIPPED [0.0008s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 28%] 2025-09-07T09:12:32.7220709Z test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 28%] 2025-09-07T09:12:32.7221655Z test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 29%] 2025-09-07T09:12:32.7222599Z test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 29%] 2025-09-07T09:12:32.7223586Z test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 29%] 2025-09-07T09:12:32.7224608Z test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 29%] 2025-09-07T09:12:32.7225650Z test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 29%] 2025-09-07T09:12:32.7226651Z test_decomp.py::TestDecompCUDA::test_comprehensive_lu_unpack_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 30%] 2025-09-07T09:12:32.7227551Z test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int16 SKIPPED [0.0008s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 30%] 2025-09-07T09:12:32.7228496Z test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 30%] 2025-09-07T09:12:32.7229472Z test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 30%] 2025-09-07T09:12:32.7230424Z test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 31%] 2025-09-07T09:12:32.7231396Z test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 31%] 2025-09-07T09:12:32.7232376Z test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 31%] 2025-09-07T09:12:32.7233323Z test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 31%] 2025-09-07T09:12:32.7234259Z test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 32%] 2025-09-07T09:12:32.7235207Z test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_complex128 SKIPPED [0.0008s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 32%] 2025-09-07T09:12:32.7236152Z test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 32%] 2025-09-07T09:12:32.7237070Z test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 32%] 2025-09-07T09:12:32.7237994Z test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 32%] 2025-09-07T09:12:32.7239004Z test_decomp.py::TestDecompCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 33%] 2025-09-07T09:12:32.7240042Z test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 33%] 2025-09-07T09:12:32.7241037Z test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 33%] 2025-09-07T09:12:32.7241996Z test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 33%] 2025-09-07T09:12:32.7242901Z test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_bool SKIPPED [0.0008s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 34%] 2025-09-07T09:12:32.7243854Z test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 34%] 2025-09-07T09:12:32.7244813Z test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 34%] 2025-09-07T09:12:32.7245719Z test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 34%] 2025-09-07T09:12:32.7246753Z test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 35%] 2025-09-07T09:12:32.7247662Z test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 35%] 2025-09-07T09:12:32.7248560Z test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 35%] 2025-09-07T09:12:32.7249529Z test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 35%] 2025-09-07T09:12:32.7250479Z test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int16 SKIPPED [0.0008s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 35%] 2025-09-07T09:12:32.7251398Z test_decomp.py::TestDecompCUDA::test_comprehensive_multinomial_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 36%] 2025-09-07T09:12:32.7252365Z test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 36%] 2025-09-07T09:12:32.7253327Z test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 36%] 2025-09-07T09:12:32.7254344Z test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 36%] 2025-09-07T09:12:32.7255254Z test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 37%] 2025-09-07T09:12:32.7256180Z test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_bfloat16 SKIPPED [0.0002s] (Expected: new_empty_strided is not comparable) [ 37%] 2025-09-07T09:12:32.7257115Z test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 37%] 2025-09-07T09:12:32.7258117Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float32 SKIPPED [0.0008s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 37%] 2025-09-07T09:12:32.7259208Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 38%] 2025-09-07T09:12:32.7260271Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 38%] 2025-09-07T09:12:32.7261309Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 38%] 2025-09-07T09:12:32.7262193Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float32 PASSED [293.7187s] [ 38%] 2025-09-07T09:12:32.7263083Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float16 SKIPPED [0.0007s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 38%] 2025-09-07T09:12:32.7264113Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 39%] 2025-09-07T09:12:32.7265158Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 39%] 2025-09-07T09:12:32.7266243Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 39%] 2025-09-07T09:12:32.7267422Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 39%] 2025-09-07T09:12:32.7268556Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 40%] 2025-09-07T09:12:32.7269617Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 40%] 2025-09-07T09:12:32.7270723Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 40%] 2025-09-07T09:12:32.7271866Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 40%] 2025-09-07T09:12:32.7272959Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 41%] 2025-09-07T09:12:32.7274000Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 41%] 2025-09-07T09:12:32.7274997Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_selu_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 41%] 2025-09-07T09:12:32.7276032Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 41%] 2025-09-07T09:12:32.7277114Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 41%] 2025-09-07T09:12:32.7278177Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 42%] 2025-09-07T09:12:32.7279223Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 42%] 2025-09-07T09:12:32.7280258Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 42%] 2025-09-07T09:12:32.7281310Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 42%] 2025-09-07T09:12:32.7282437Z test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 43%] 2025-09-07T09:12:32.7283514Z test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 43%] 2025-09-07T09:12:32.7284464Z test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 43%] 2025-09-07T09:12:32.7285429Z test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 43%] 2025-09-07T09:12:32.7286403Z test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 44%] 2025-09-07T09:12:32.7287349Z test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 44%] 2025-09-07T09:12:32.7288307Z test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 44%] 2025-09-07T09:12:32.7289499Z test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 44%] 2025-09-07T09:12:32.7290409Z test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 45%] 2025-09-07T09:12:32.7291394Z test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 45%] 2025-09-07T09:12:32.7292422Z test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 45%] 2025-09-07T09:12:32.7293423Z test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 45%] 2025-09-07T09:12:32.7294533Z test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 45%] 2025-09-07T09:12:32.7295558Z test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 46%] 2025-09-07T09:12:32.7296574Z test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 46%] 2025-09-07T09:12:32.7297570Z test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 46%] 2025-09-07T09:12:32.7298519Z test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 46%] 2025-09-07T09:12:32.7299428Z test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 47%] 2025-09-07T09:12:32.7300350Z test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 47%] 2025-09-07T09:12:32.7301278Z test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 47%] 2025-09-07T09:12:32.7302202Z test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 47%] 2025-09-07T09:12:32.7303126Z test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 48%] 2025-09-07T09:12:32.7304059Z test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 48%] 2025-09-07T09:12:32.7305006Z test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 48%] 2025-09-07T09:12:32.7305937Z test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 48%] 2025-09-07T09:12:32.7306865Z test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 48%] 2025-09-07T09:12:32.7307796Z test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 49%] 2025-09-07T09:12:32.7308708Z test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 49%] 2025-09-07T09:12:32.7309733Z test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 49%] 2025-09-07T09:12:32.7310764Z test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 49%] 2025-09-07T09:12:32.7311747Z test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 50%] 2025-09-07T09:12:32.7312752Z test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 50%] 2025-09-07T09:12:32.7313755Z test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 50%] 2025-09-07T09:12:32.7314664Z test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 50%] 2025-09-07T09:12:32.7315554Z test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 51%] 2025-09-07T09:12:32.7316448Z test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 51%] 2025-09-07T09:12:32.7317384Z test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_3_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 51%] 2025-09-07T09:12:32.7318337Z test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 51%] 2025-09-07T09:12:32.7319274Z test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 51%] 2025-09-07T09:12:32.7320246Z test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 52%] 2025-09-07T09:12:32.7321203Z test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 52%] 2025-09-07T09:12:32.7322138Z test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 52%] 2025-09-07T09:12:32.7323097Z test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 52%] 2025-09-07T09:12:32.7324075Z test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 53%] 2025-09-07T09:12:32.7325052Z test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 53%] 2025-09-07T09:12:32.7326040Z test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 53%] 2025-09-07T09:12:32.7326992Z test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 53%] 2025-09-07T09:12:32.7327934Z test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 54%] 2025-09-07T09:12:32.7328866Z test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 54%] 2025-09-07T09:12:32.7329767Z test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 54%] 2025-09-07T09:12:32.7330883Z test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 54%] 2025-09-07T09:12:32.7331918Z test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_hann_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 54%] 2025-09-07T09:12:32.7332926Z test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_nuttall_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 55%] 2025-09-07T09:12:32.7334071Z test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 55%] 2025-09-07T09:12:32.7335067Z test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 55%] 2025-09-07T09:12:32.7335969Z test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 55%] 2025-09-07T09:12:32.7336880Z test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 56%] 2025-09-07T09:12:32.7337829Z test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 56%] 2025-09-07T09:12:32.7338763Z test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 56%] 2025-09-07T09:12:32.7339653Z test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 56%] 2025-09-07T09:12:32.7340628Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 57%] 2025-09-07T09:12:32.7341690Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 57%] 2025-09-07T09:12:32.7342746Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 57%] 2025-09-07T09:12:32.7343749Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 57%] 2025-09-07T09:12:32.7344753Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 58%] 2025-09-07T09:12:32.7345816Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 58%] 2025-09-07T09:12:32.7346875Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 58%] 2025-09-07T09:12:32.7347868Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 58%] 2025-09-07T09:12:32.7348858Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 58%] 2025-09-07T09:12:32.7349877Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 59%] 2025-09-07T09:12:32.7350876Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 59%] 2025-09-07T09:12:32.7352058Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 59%] 2025-09-07T09:12:32.7353085Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 59%] 2025-09-07T09:12:32.7354071Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 60%] 2025-09-07T09:12:32.7355068Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtri_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 60%] 2025-09-07T09:12:32.7356160Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 60%] 2025-09-07T09:12:32.7357266Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 60%] 2025-09-07T09:12:32.7358328Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 61%] 2025-09-07T09:12:32.7359320Z test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 61%] 2025-09-07T09:12:32.7360254Z test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 61%] 2025-09-07T09:12:32.7361188Z test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 61%] 2025-09-07T09:12:32.7362171Z test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 61%] 2025-09-07T09:12:32.7363162Z test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 62%] 2025-09-07T09:12:32.7364140Z test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 62%] 2025-09-07T09:12:32.7365106Z test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 62%] 2025-09-07T09:12:32.7366047Z test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 62%] 2025-09-07T09:12:32.7366991Z test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 63%] 2025-09-07T09:12:32.7367963Z test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 63%] 2025-09-07T09:12:32.7368878Z test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 63%] 2025-09-07T09:12:32.7369781Z test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 63%] 2025-09-07T09:12:32.7370694Z test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 64%] 2025-09-07T09:12:32.7371587Z test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 64%] 2025-09-07T09:12:32.7372553Z test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 64%] 2025-09-07T09:12:32.7373506Z test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 64%] 2025-09-07T09:12:32.7374504Z test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 64%] 2025-09-07T09:12:32.7375500Z test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 65%] 2025-09-07T09:12:32.7376511Z test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 65%] 2025-09-07T09:12:32.7377459Z test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 65%] 2025-09-07T09:12:32.7378402Z test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 65%] 2025-09-07T09:12:32.7379330Z test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 66%] 2025-09-07T09:12:32.7380236Z test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 66%] 2025-09-07T09:12:32.7381132Z test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 66%] 2025-09-07T09:12:32.7382040Z test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 66%] 2025-09-07T09:12:32.7382962Z test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 67%] 2025-09-07T09:12:32.7383912Z test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 67%] 2025-09-07T09:12:32.7384883Z test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 67%] 2025-09-07T09:12:32.7385835Z test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 67%] 2025-09-07T09:12:32.7386759Z test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 67%] 2025-09-07T09:12:32.7387723Z test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 68%] 2025-09-07T09:12:32.7388682Z test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 68%] 2025-09-07T09:12:32.7389596Z test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 68%] 2025-09-07T09:12:32.7390531Z test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 68%] 2025-09-07T09:12:32.7391471Z test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 69%] 2025-09-07T09:12:32.7392392Z test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 69%] 2025-09-07T09:12:32.7393395Z test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 69%] 2025-09-07T09:12:32.7394369Z test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 69%] 2025-09-07T09:12:32.7395264Z test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 70%] 2025-09-07T09:12:32.7396232Z test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 70%] 2025-09-07T09:12:32.7397242Z test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 70%] 2025-09-07T09:12:32.7398196Z test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 70%] 2025-09-07T09:12:32.7399207Z test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 70%] 2025-09-07T09:12:32.7400156Z test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 71%] 2025-09-07T09:12:32.7400992Z test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 71%] 2025-09-07T09:12:32.7401820Z test_decomp.py::TestDecompCUDA::test_quick_all_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 71%] 2025-09-07T09:12:32.7402650Z test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 71%] 2025-09-07T09:12:32.7403498Z test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 72%] 2025-09-07T09:12:32.7404358Z test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 72%] 2025-09-07T09:12:32.7405211Z test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 72%] 2025-09-07T09:12:32.7406077Z test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 72%] 2025-09-07T09:12:32.7406950Z test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 73%] 2025-09-07T09:12:32.7407797Z test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 73%] 2025-09-07T09:12:32.7408638Z test_decomp.py::TestDecompCUDA::test_quick_cauchy_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 73%] 2025-09-07T09:12:32.7409481Z test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 73%] 2025-09-07T09:12:32.7410323Z test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 74%] 2025-09-07T09:12:32.7411192Z test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 74%] 2025-09-07T09:12:32.7411926Z test_decomp.py::TestDecompCUDA::test_quick_core_backward__unsafe_masked_index_cuda_float64 2025-09-07T09:12:32.7412264Z 2025-09-07T09:12:32.7412672Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_decomp/test_decomp-a42bd662c33acfa8.xml - 2025-09-07T09:12:32.7413378Z !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T09:12:32.7414064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:841: KeyboardInterrupt 2025-09-07T09:12:32.7414624Z (to show a full traceback on KeyboardInterrupt use --full-trace) 2025-09-07T09:12:32.7415031Z ================= 3 passed, 317 skipped in 3596.03s (0:59:56) ================== 2025-09-07T09:12:32.7415350Z Got exit code 2 2025-09-07T09:12:32.7415557Z Retrying single test... 2025-09-07T09:12:32.7416797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:12:32.7418042Z import pkg_resources 2025-09-07T09:12:32.7418466Z Test results will be stored in test-reports/python-pytest/test_decomp/test_decomp-2c9c0a953fbffbed.xml 2025-09-07T09:12:32.7418972Z ============================= test session starts ============================== 2025-09-07T09:12:32.7419449Z platform linux -- Python 3.10.18, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-09-07T09:12:32.7419873Z cachedir: .pytest_cache 2025-09-07T09:12:32.7420373Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T09:12:32.7420917Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T09:12:32.7421170Z configfile: pytest.ini 2025-09-07T09:12:32.7421675Z plugins: cpp-2.3.0, hypothesis-5.35.1, flakefinder-1.1.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.1.0, rerunfailures-14.0, typeguard-4.3.0 2025-09-07T09:12:32.7422292Z collecting ... collected 9001 items / 430 deselected / 8571 selected 2025-09-07T09:12:32.7422998Z stepcurrent: skipping 320 already run items. Running only test/test_decomp.py::TestDecompCUDA::test_quick_core_backward__unsafe_masked_index_cuda_float64 2025-09-07T09:12:32.7423614Z Running 1 items in this shard 2025-09-07T09:12:32.7423772Z 2025-09-07T09:12:32.7424095Z test_decomp.py::TestDecompCUDA::test_quick_core_backward__unsafe_masked_index_cuda_float64 PASSED [1350.0156s] [100%] 2025-09-07T09:12:32.7424507Z 2025-09-07T09:12:32.7424900Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_decomp/test_decomp-2c9c0a953fbffbed.xml - 2025-09-07T09:12:32.7425534Z ================ 1 passed, 430 deselected in 1350.66s (0:22:30) ================ 2025-09-07T09:12:32.7425853Z Got exit code 0 2025-09-07T09:12:32.7426155Z Test succeeeded in new process, continuing with the rest of the tests 2025-09-07T09:12:32.7427419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:12:32.7428570Z import pkg_resources 2025-09-07T09:12:32.7428992Z Test results will be stored in test-reports/python-pytest/test_decomp/test_decomp-c59fc645435aed77.xml 2025-09-07T09:12:32.7429487Z ============================= test session starts ============================== 2025-09-07T09:12:32.7429951Z platform linux -- Python 3.10.18, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-09-07T09:12:32.7430370Z cachedir: .pytest_cache 2025-09-07T09:12:32.7430863Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T09:12:32.7431403Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T09:12:32.7431647Z configfile: pytest.ini 2025-09-07T09:12:32.7432139Z plugins: cpp-2.3.0, hypothesis-5.35.1, flakefinder-1.1.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.1.0, rerunfailures-14.0, typeguard-4.3.0 2025-09-07T09:12:32.7432754Z collecting ... collected 9001 items / 321 deselected / 8680 selected 2025-09-07T09:12:32.7433213Z stepcurrent: skipping 321 already run items. 2025-09-07T09:12:32.7433490Z Running 110 items in this shard 2025-09-07T09:12:32.7433710Z 2025-09-07T09:12:32.7434034Z test_decomp.py::TestDecompCUDA::test_quick_core_backward_diag_embed_cuda_float64 SKIPPED [0.0003s] (Skipped!) [ 0%] 2025-09-07T09:12:32.7434891Z test_decomp.py::TestDecompCUDA::test_quick_core_backward_linalg_cross_cuda_float64 SKIPPED [0.0007s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 1%] 2025-09-07T09:12:32.7435984Z test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_hardshrink_cuda_float64 SKIPPED [0.0012s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 2%] 2025-09-07T09:12:32.7436975Z test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_max_unpool3d_grad_cuda_float64 PASSED [321.1630s] [ 3%] 2025-09-07T09:12:32.7437783Z test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_bool SKIPPED [0.0007s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 4%] 2025-09-07T09:12:32.7438624Z test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 5%] 2025-09-07T09:12:32.7439471Z test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 6%] 2025-09-07T09:12:32.7440325Z test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 7%] 2025-09-07T09:12:32.7441176Z test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 8%] 2025-09-07T09:12:32.7442024Z test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 9%] 2025-09-07T09:12:32.7442906Z test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 10%] 2025-09-07T09:12:32.7443785Z test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 10%] 2025-09-07T09:12:32.7444611Z test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 11%] 2025-09-07T09:12:32.7445439Z test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 12%] 2025-09-07T09:12:32.7446268Z test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 13%] 2025-09-07T09:12:32.7447097Z test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 14%] 2025-09-07T09:12:32.7447925Z test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 15%] 2025-09-07T09:12:32.7448793Z test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 16%] 2025-09-07T09:12:32.7449671Z test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 17%] 2025-09-07T09:12:32.7450545Z test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 18%] 2025-09-07T09:12:32.7451419Z test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 19%] 2025-09-07T09:12:32.7452235Z test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 20%] 2025-09-07T09:12:32.7453078Z test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 20%] 2025-09-07T09:12:32.7454180Z test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 21%] 2025-09-07T09:12:32.7455057Z test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 22%] 2025-09-07T09:12:32.7456026Z test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 23%] 2025-09-07T09:12:32.7456964Z test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 24%] 2025-09-07T09:12:32.7457827Z test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 25%] 2025-09-07T09:12:32.7458689Z test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 26%] 2025-09-07T09:12:32.7459542Z test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 27%] 2025-09-07T09:12:32.7460390Z test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 28%] 2025-09-07T09:12:32.7461239Z test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 29%] 2025-09-07T09:12:32.7462100Z test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 30%] 2025-09-07T09:12:32.7462937Z test_decomp.py::TestDecompCUDA::test_quick_full_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 30%] 2025-09-07T09:12:32.7463766Z test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 31%] 2025-09-07T09:12:32.7464573Z test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 32%] 2025-09-07T09:12:32.7465401Z test_decomp.py::TestDecompCUDA::test_quick_hypot_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 33%] 2025-09-07T09:12:32.7466274Z test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 34%] 2025-09-07T09:12:32.7467172Z test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 35%] 2025-09-07T09:12:32.7468049Z test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 36%] 2025-09-07T09:12:32.7468911Z test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 37%] 2025-09-07T09:12:32.7469762Z test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 38%] 2025-09-07T09:12:32.7470614Z test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 39%] 2025-09-07T09:12:32.7471486Z test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 40%] 2025-09-07T09:12:32.7472346Z test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 40%] 2025-09-07T09:12:32.7473173Z test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 41%] 2025-09-07T09:12:32.7474179Z test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 42%] 2025-09-07T09:12:32.7475055Z test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 43%] 2025-09-07T09:12:32.7476021Z test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 44%] 2025-09-07T09:12:32.7477046Z test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 45%] 2025-09-07T09:12:32.7477940Z test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 46%] 2025-09-07T09:12:32.7478798Z test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 47%] 2025-09-07T09:12:32.7479681Z test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 48%] 2025-09-07T09:12:32.7480542Z test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 49%] 2025-09-07T09:12:32.7481412Z test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int16 SKIPPED [0.0002s] (Expected: new_empty_strided is not comparable) [ 50%] 2025-09-07T09:12:32.7482270Z test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 50%] 2025-09-07T09:12:32.7483116Z test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 51%] 2025-09-07T09:12:32.7484031Z test_decomp.py::TestDecompCUDA::test_quick_nn_functional_embedding_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 52%] 2025-09-07T09:12:32.7485005Z test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardshrink_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 53%] 2025-09-07T09:12:32.7485971Z test_decomp.py::TestDecompCUDA::test_quick_nn_functional_leaky_relu_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 54%] 2025-09-07T09:12:32.7486965Z test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 55%] 2025-09-07T09:12:32.7487939Z test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 56%] 2025-09-07T09:12:32.7488863Z test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 57%] 2025-09-07T09:12:32.7489812Z test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softplus_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 58%] 2025-09-07T09:12:32.7490715Z test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 59%] 2025-09-07T09:12:32.7491576Z test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 60%] 2025-09-07T09:12:32.7492452Z test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 60%] 2025-09-07T09:12:32.7493300Z test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 61%] 2025-09-07T09:12:32.7494364Z test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 62%] 2025-09-07T09:12:32.7495208Z test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 63%] 2025-09-07T09:12:32.7496058Z test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 64%] 2025-09-07T09:12:32.7496973Z test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 65%] 2025-09-07T09:12:32.7497966Z test_decomp.py::TestDecompCUDA::test_quick_round_decimals_neg_3_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 66%] 2025-09-07T09:12:32.7498869Z test_decomp.py::TestDecompCUDA::test_quick_select_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 67%] 2025-09-07T09:12:32.7499719Z test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 68%] 2025-09-07T09:12:32.7500544Z test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 69%] 2025-09-07T09:12:32.7501378Z test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 70%] 2025-09-07T09:12:32.7502218Z test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 70%] 2025-09-07T09:12:32.7503056Z test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 71%] 2025-09-07T09:12:32.7503923Z test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 72%] 2025-09-07T09:12:32.7504784Z test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 73%] 2025-09-07T09:12:32.7505633Z test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 74%] 2025-09-07T09:12:32.7506522Z test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 75%] 2025-09-07T09:12:32.7507409Z test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 76%] 2025-09-07T09:12:32.7508283Z test_decomp.py::TestDecompCUDA::test_quick_split_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 77%] 2025-09-07T09:12:32.7509162Z test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 78%] 2025-09-07T09:12:32.7510059Z test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 79%] 2025-09-07T09:12:32.7510928Z test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 80%] 2025-09-07T09:12:32.7511830Z test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex128 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 80%] 2025-09-07T09:12:32.7512720Z test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_bfloat16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 81%] 2025-09-07T09:12:32.7513558Z test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 82%] 2025-09-07T09:12:32.7514556Z test_decomp.py::TestDecompCUDA::test_quick_take_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 83%] 2025-09-07T09:12:32.7515386Z test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 84%] 2025-09-07T09:12:32.7516211Z test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 85%] 2025-09-07T09:12:32.7517118Z test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_complex32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 86%] 2025-09-07T09:12:32.7518071Z test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 87%] 2025-09-07T09:12:32.7518944Z test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_float64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 88%] 2025-09-07T09:12:32.7519775Z test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_uint8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 89%] 2025-09-07T09:12:32.7520602Z test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_bool SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 90%] 2025-09-07T09:12:32.7521463Z test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 90%] 2025-09-07T09:12:32.7522344Z test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 91%] 2025-09-07T09:12:32.7523206Z test_decomp.py::TestDecompCUDA::test_quick_uniform_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 92%] 2025-09-07T09:12:32.7524080Z test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int8 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 93%] 2025-09-07T09:12:32.7524970Z test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex64 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 94%] 2025-09-07T09:12:32.7525845Z test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_int16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 95%] 2025-09-07T09:12:32.7526707Z test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 96%] 2025-09-07T09:12:32.7527566Z test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 97%] 2025-09-07T09:12:32.7528417Z test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 98%] 2025-09-07T09:12:32.7529328Z test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_GRU_eval_mode_cuda_float32 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [ 99%] 2025-09-07T09:12:32.7530389Z test_decomp.py::DecompOneOffTestsCUDA::test_sdpa_nn_functional_scaled_dot_product_attention_cuda_float16 SKIPPED [0.0006s] (test is fast; we disabled it with PYTORCH_TEST_SKIP_FAST) [100%] 2025-09-07T09:12:32.7531006Z 2025-09-07T09:12:32.7531408Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_decomp/test_decomp-c59fc645435aed77.xml - 2025-09-07T09:12:32.7532055Z ========== 1 passed, 109 skipped, 321 deselected in 321.99s (0:05:21) ========== 2025-09-07T09:12:32.7532791Z The following tests failed and then succeeded when run in a new process['test/test_decomp.py::TestDecompCUDA::test_quick_core_backward__unsafe_masked_index_cuda_float64'] 2025-09-07T09:12:32.7533376Z 2025-09-07T09:12:32.7533672Z FINISHED PRINTING LOG FILE of test_decomp 15/22 (test/test-reports/test_decomp_15.22_91568ce4372ebc5f_.log) 2025-09-07T09:12:32.7534189Z 2025-09-07T09:12:32.7534406Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T09:12:32.7534908Z Uploading artifacts took 0.00 seconds 2025-09-07T09:12:33.6297045Z Running test batch 'tests to run' cost 9010.96 seconds 2025-09-07T09:12:34.7263509Z 2025-09-07T09:12:34.7263830Z real 150m17.076s 2025-09-07T09:12:34.7264289Z user 3308m17.497s 2025-09-07T09:12:34.7264707Z sys 197m42.547s 2025-09-07T09:12:34.7265121Z + assert_git_not_dirty 2025-09-07T09:12:34.7265628Z + [[ linux-jammy-rocm-py3.10 != *rocm* ]] 2025-09-07T09:12:34.7266632Z + test_aten 2025-09-07T09:12:34.7267285Z + echo 'Running ATen tests with pytorch lib' 2025-09-07T09:12:34.7267876Z Running ATen tests with pytorch lib 2025-09-07T09:12:34.7268315Z + [[ -n '' ]] 2025-09-07T09:12:34.7268676Z + echo 'Running test with the build folder' 2025-09-07T09:12:34.7269148Z Running test with the build folder 2025-09-07T09:12:34.7269578Z + TEST_BASE_DIR=build/bin 2025-09-07T09:12:34.7270735Z + ln -sf /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libc10.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libc10_hip.so build/bin 2025-09-07T09:12:34.7290022Z + ln -sf /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libcaffe2_nvrtc.so build/bin 2025-09-07T09:12:34.7314531Z + ln -sf '/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libmkldnn*' build/bin 2025-09-07T09:12:34.7337745Z + ln -sf '/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libnccl*' build/bin 2025-09-07T09:12:34.7358954Z + ln -sf /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_global_deps.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_hip.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_python.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorchbind_test.so build/bin 2025-09-07T09:12:34.7377186Z + ls build/bin 2025-09-07T09:12:34.7406694Z BackoffTest 2025-09-07T09:12:34.7407036Z CMakeFiles 2025-09-07T09:12:34.7407368Z CTestTestfile.cmake 2025-09-07T09:12:34.7407810Z CppSignature_test 2025-09-07T09:12:34.7408162Z Dict_test 2025-09-07T09:12:34.7408526Z Dimname_test 2025-09-07T09:12:34.7408857Z FileStoreTest 2025-09-07T09:12:34.7409172Z HashStoreTest 2025-09-07T09:12:34.7409485Z IListRef_test 2025-09-07T09:12:34.7409804Z KernelFunction_test 2025-09-07T09:12:34.7410166Z List_test 2025-09-07T09:12:34.7410464Z MaybeOwned_test 2025-09-07T09:12:34.7410802Z NamedTensor_test 2025-09-07T09:12:34.7411156Z ProcessGroupGlooTest 2025-09-07T09:12:34.7411526Z StorageUtils_test 2025-09-07T09:12:34.7411859Z TCPStoreTest 2025-09-07T09:12:34.7412189Z apply_utils_test 2025-09-07T09:12:34.7412505Z atest 2025-09-07T09:12:34.7412821Z backend_fallback_test 2025-09-07T09:12:34.7413173Z basic 2025-09-07T09:12:34.7413503Z broadcast_test 2025-09-07T09:12:34.7414044Z c10_AllocatorConfig_test 2025-09-07T09:12:34.7414496Z c10_ArrayRef_test 2025-09-07T09:12:34.7414879Z c10_Bitset_test 2025-09-07T09:12:34.7415323Z c10_CompileTimeFunctionPointer_test 2025-09-07T09:12:34.7415865Z c10_ConstexprCrc_test 2025-09-07T09:12:34.7416310Z c10_DeadlockDetection_test 2025-09-07T09:12:34.7416768Z c10_DeviceGuard_test 2025-09-07T09:12:34.7417188Z c10_Device_test 2025-09-07T09:12:34.7417542Z c10_DispatchKeySet_test 2025-09-07T09:12:34.7417802Z c10_Enumerate_test 2025-09-07T09:12:34.7418020Z c10_Half_test 2025-09-07T09:12:34.7418224Z c10_InlineDeviceGuard_test 2025-09-07T09:12:34.7418464Z c10_InlineStreamGuard_test 2025-09-07T09:12:34.7418692Z c10_IntrusiveList_test 2025-09-07T09:12:34.7418904Z c10_LeftRight_test 2025-09-07T09:12:34.7419118Z c10_Metaprogramming_test 2025-09-07T09:12:34.7419348Z c10_NetworkFlow_test 2025-09-07T09:12:34.7419562Z c10_Scalar_test 2025-09-07T09:12:34.7419763Z c10_Semaphore_test 2025-09-07T09:12:34.7420130Z c10_SizesAndStrides_test 2025-09-07T09:12:34.7420360Z c10_StreamGuard_test 2025-09-07T09:12:34.7420571Z c10_SymInt_test 2025-09-07T09:12:34.7420875Z c10_Synchronized_test 2025-09-07T09:12:34.7421102Z c10_ThreadLocal_test 2025-09-07T09:12:34.7421322Z c10_TypeIndex_test 2025-09-07T09:12:34.7421528Z c10_TypeList_test 2025-09-07T09:12:34.7421731Z c10_TypeTraits_test 2025-09-07T09:12:34.7421946Z c10_accumulate_test 2025-09-07T09:12:34.7422154Z c10_bfloat16_test 2025-09-07T09:12:34.7422343Z c10_bit_cast_test 2025-09-07T09:12:34.7422543Z c10_complex_math_test 2025-09-07T09:12:34.7422842Z c10_complex_test 2025-09-07T09:12:34.7423123Z c10_cow_test 2025-09-07T09:12:34.7423316Z c10_error_test 2025-09-07T09:12:34.7423536Z c10_exception_test 2025-09-07T09:12:34.7423804Z c10_flags_test 2025-09-07T09:12:34.7424040Z c10_generic_math_test 2025-09-07T09:12:34.7424319Z c10_hip_HIPAssertionsTest_1_var_test 2025-09-07T09:12:34.7424667Z c10_hip_HIPAssertionsTest_catches_stream 2025-09-07T09:12:34.7425101Z c10_hip_HIPAssertionsTest_catches_thread_and_block_and_device 2025-09-07T09:12:34.7425542Z c10_hip_HIPAssertionsTest_from_2_processes 2025-09-07T09:12:34.7425989Z c10_hip_HIPAssertionsTest_multiple_writes_from_blocks_and_threads 2025-09-07T09:12:34.7426515Z c10_hip_HIPAssertionsTest_multiple_writes_from_multiple_blocks 2025-09-07T09:12:34.7426990Z c10_hip_HIPAssertionsTest_multiple_writes_from_same_block 2025-09-07T09:12:34.7427368Z c10_hip_HIPTest 2025-09-07T09:12:34.7427623Z c10_intrusive_ptr_benchmark 2025-09-07T09:12:34.7427909Z c10_intrusive_ptr_test 2025-09-07T09:12:34.7428164Z c10_irange_test 2025-09-07T09:12:34.7428397Z c10_lazy_test 2025-09-07T09:12:34.7428625Z c10_logging_test 2025-09-07T09:12:34.7428863Z c10_optional_test 2025-09-07T09:12:34.7429118Z c10_ordered_preserving_dict_test 2025-09-07T09:12:34.7429418Z c10_registry_test 2025-09-07T09:12:34.7429653Z c10_small_vector_test 2025-09-07T09:12:34.7429900Z c10_ssize_test 2025-09-07T09:12:34.7430134Z c10_string_util_test 2025-09-07T09:12:34.7430388Z c10_string_view_test 2025-09-07T09:12:34.7430634Z c10_tempfile_test 2025-09-07T09:12:34.7430868Z c10_typeid_test 2025-09-07T09:12:34.7431105Z cmake_install.cmake 2025-09-07T09:12:34.7431355Z cpu_allocator_test 2025-09-07T09:12:34.7431596Z cpu_generator_test 2025-09-07T09:12:34.7431846Z cpu_profiling_allocator_test 2025-09-07T09:12:34.7432133Z cpu_rng_test 2025-09-07T09:12:34.7432370Z dlconvertor_test 2025-09-07T09:12:34.7432614Z example_allreduce 2025-09-07T09:12:34.7432861Z extension_backend_test 2025-09-07T09:12:34.7433117Z half_test 2025-09-07T09:12:34.7433345Z hip_apply_test 2025-09-07T09:12:34.7433590Z hip_complex_math_test 2025-09-07T09:12:34.7433853Z hip_complex_test 2025-09-07T09:12:34.7434102Z hip_distributions_test 2025-09-07T09:12:34.7434357Z hip_dlconvertor_test 2025-09-07T09:12:34.7434610Z hip_generator_test 2025-09-07T09:12:34.7434849Z hip_half_test 2025-09-07T09:12:34.7435084Z hip_integer_divider_test 2025-09-07T09:12:34.7435311Z hip_optional_test 2025-09-07T09:12:34.7435540Z hip_packedtensoraccessor_test 2025-09-07T09:12:34.7435789Z hip_vectorized_test 2025-09-07T09:12:34.7436002Z inline_container_test 2025-09-07T09:12:34.7436215Z ivalue_test 2025-09-07T09:12:34.7436420Z kernel_function_legacy_test 2025-09-07T09:12:34.7436650Z kernel_function_test 2025-09-07T09:12:34.7436868Z kernel_lambda_legacy_test 2025-09-07T09:12:34.7437098Z kernel_lambda_test 2025-09-07T09:12:34.7437305Z kernel_stackbased_test 2025-09-07T09:12:34.7437518Z lazy_tensor_test 2025-09-07T09:12:34.7437722Z legacy_vmap_test 2025-09-07T09:12:34.7437914Z libc10.so 2025-09-07T09:12:34.7438100Z libc10_hip.so 2025-09-07T09:12:34.7438298Z libcaffe2_nvrtc.so 2025-09-07T09:12:34.7438486Z 'libmkldnn*' 2025-09-07T09:12:34.7438676Z 'libnccl*' 2025-09-07T09:12:34.7438858Z libtorch.so 2025-09-07T09:12:34.7439047Z libtorch_cpu.so 2025-09-07T09:12:34.7439255Z libtorch_global_deps.so 2025-09-07T09:12:34.7439483Z libtorch_hip.so 2025-09-07T09:12:34.7439683Z libtorch_python.so 2025-09-07T09:12:34.7439996Z libtorchbind_test.so 2025-09-07T09:12:34.7440227Z make_boxed_from_unboxed_functor_test 2025-09-07T09:12:34.7440486Z math_kernel_test 2025-09-07T09:12:34.7440747Z memory_format_test 2025-09-07T09:12:34.7440965Z memory_overlapping_test 2025-09-07T09:12:34.7441193Z mobile_memory_cleanup 2025-09-07T09:12:34.7441395Z native_test 2025-09-07T09:12:34.7441578Z op_allowlist_test 2025-09-07T09:12:34.7441778Z op_registration_test 2025-09-07T09:12:34.7441981Z operator_name_test 2025-09-07T09:12:34.7442176Z operators_test 2025-09-07T09:12:34.7442378Z packedtensoraccessor_test 2025-09-07T09:12:34.7442672Z parallel_benchmark 2025-09-07T09:12:34.7442938Z pow_test 2025-09-07T09:12:34.7443120Z protoc 2025-09-07T09:12:34.7443301Z protoc-3.13.0.0 2025-09-07T09:12:34.7443493Z quantized_test 2025-09-07T09:12:34.7443676Z reduce_ops_test 2025-09-07T09:12:34.7443874Z reportMemoryUsage_test 2025-09-07T09:12:34.7444092Z scalar_tensor_test 2025-09-07T09:12:34.7444288Z scalar_test 2025-09-07T09:12:34.7444474Z static_runtime_bench 2025-09-07T09:12:34.7444673Z static_runtime_test 2025-09-07T09:12:34.7444880Z stride_properties_test 2025-09-07T09:12:34.7445090Z tensor_iterator_test 2025-09-07T09:12:34.7445286Z test_api 2025-09-07T09:12:34.7445462Z test_cpp_rpc 2025-09-07T09:12:34.7445648Z test_dist_autograd 2025-09-07T09:12:34.7445835Z test_jit 2025-09-07T09:12:34.7445998Z test_lazy 2025-09-07T09:12:34.7446169Z test_nativert 2025-09-07T09:12:34.7446344Z test_parallel 2025-09-07T09:12:34.7446529Z thread_init_test 2025-09-07T09:12:34.7446721Z torch_shm_manager 2025-09-07T09:12:34.7446907Z type_ptr_test 2025-09-07T09:12:34.7447085Z type_test 2025-09-07T09:12:34.7447273Z undefined_tensor_test 2025-09-07T09:12:34.7447502Z vec_test_all_types_AVX2 2025-09-07T09:12:34.7447719Z vec_test_all_types_AVX512 2025-09-07T09:12:34.7447941Z vec_test_all_types_DEFAULT 2025-09-07T09:12:34.7448162Z verify_api_visibility 2025-09-07T09:12:34.7448358Z weakref_test 2025-09-07T09:12:34.7448546Z wrapdim_test 2025-09-07T09:12:34.7448742Z xla_tensor_test 2025-09-07T09:12:34.7448941Z + aten/tools/run_tests.sh build/bin 2025-09-07T09:12:34.7449189Z + set -e 2025-09-07T09:12:34.7449381Z ++ dirname aten/tools/run_tests.sh 2025-09-07T09:12:34.7458087Z + VALGRIND_SUP=/var/lib/jenkins/pytorch/aten/tools/valgrind.sup 2025-09-07T09:12:34.7458458Z + export CPP_TESTS_DIR=build/bin 2025-09-07T09:12:34.7458702Z + CPP_TESTS_DIR=build/bin 2025-09-07T09:12:34.7458913Z + VALGRIND=OFF 2025-09-07T09:12:34.7460734Z + python test/run_test.py --cpp --verbose -i cpp/basic cpp/atest cpp/scalar_test cpp/broadcast_test cpp/wrapdim_test cpp/apply_utils_test cpp/dlconvertor_test cpp/native_test cpp/scalar_tensor_test cpp/undefined_tensor_test cpp/extension_backend_test cpp/lazy_tensor_test cpp/tensor_iterator_test cpp/Dimname_test cpp/Dict_test cpp/NamedTensor_test cpp/cpu_generator_test cpp/legacy_vmap_test cpp/operators_test 2025-09-07T09:12:37.3480951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:12:37.3483096Z import pkg_resources 2025-09-07T09:12:39.6458873Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/pytorch/test/.pytorch-disabled-tests.json 2025-09-07T09:12:39.6589757Z Found test times from artifacts 2025-09-07T09:12:39.7143849Z Found test times from artifacts 2025-09-07T09:12:39.7160926Z Running all tests 2025-09-07T09:12:39.7165530Z Running parallel tests on 8 processes 2025-09-07T09:12:39.7167112Z Name: tests to run (est. time: 0.0min) 2025-09-07T09:12:39.7167409Z Serial tests (0): 2025-09-07T09:12:39.7167610Z Parallel tests (19): 2025-09-07T09:12:39.7167824Z cpp/Dict_test 1/1 2025-09-07T09:12:39.7168035Z cpp/Dimname_test 1/1 2025-09-07T09:12:39.7168271Z cpp/NamedTensor_test 1/1 2025-09-07T09:12:39.7168855Z cpp/apply_utils_test 1/1 2025-09-07T09:12:39.7169078Z cpp/atest 1/1 2025-09-07T09:12:39.7169269Z cpp/basic 1/1 2025-09-07T09:12:39.7169589Z cpp/broadcast_test 1/1 2025-09-07T09:12:39.7169835Z cpp/cpu_generator_test 1/1 2025-09-07T09:12:39.7170092Z cpp/dlconvertor_test 1/1 2025-09-07T09:12:39.7170331Z cpp/extension_backend_test 1/1 2025-09-07T09:12:39.7170586Z cpp/lazy_tensor_test 1/1 2025-09-07T09:12:39.7170811Z cpp/legacy_vmap_test 1/1 2025-09-07T09:12:39.7171034Z cpp/native_test 1/1 2025-09-07T09:12:39.7171372Z cpp/operators_test 1/1 2025-09-07T09:12:39.7171713Z cpp/scalar_tensor_test 1/1 2025-09-07T09:12:39.7171939Z cpp/scalar_test 1/1 2025-09-07T09:12:39.7172153Z cpp/tensor_iterator_test 1/1 2025-09-07T09:12:39.7172401Z cpp/undefined_tensor_test 1/1 2025-09-07T09:12:39.7172644Z cpp/wrapdim_test 1/1 2025-09-07T09:12:39.7172880Z Name: excluded (est. time: 0.0min) 2025-09-07T09:12:39.7173124Z Serial tests (0): 2025-09-07T09:12:39.7173327Z Parallel tests (0): 2025-09-07T09:12:39.7173613Z Running cpp/Dict_test 1/1 ... [2025-09-07 09:12:39.717214] 2025-09-07T09:12:39.7174071Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:39.7180300Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/Dict_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-d623f931cb70820e.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:39.717852] 2025-09-07T09:12:40.7851502Z 2025-09-07T09:12:40.7852898Z cpp/Dict_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.Dict_test_1.1_cd3a40a7b65c5ffc_.log 2025-09-07T09:12:40.7853983Z 2025-09-07T09:12:40.7854399Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T09:12:40.7855115Z Uploading artifacts took 0.00 seconds 2025-09-07T09:12:40.7855689Z Running cpp/Dimname_test 1/1 ... [2025-09-07 09:12:40.785225] 2025-09-07T09:12:40.7856282Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:40.7858779Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/Dimname_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-098c56fd18a6b6d1.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:40.785621] 2025-09-07T09:12:41.9026281Z 2025-09-07T09:12:41.9027479Z cpp/Dimname_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.Dimname_test_1.1_a5a322323fbca916_.log 2025-09-07T09:12:41.9028447Z 2025-09-07T09:12:41.9028806Z Running cpp/NamedTensor_test 1/1 ... [2025-09-07 09:12:41.902599] 2025-09-07T09:12:41.9029447Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:41.9033671Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/NamedTensor_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-a169f8f5bd0172e1.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:41.903132] 2025-09-07T09:12:43.0198915Z 2025-09-07T09:12:43.0200437Z cpp/NamedTensor_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.NamedTensor_test_1.1_02b80032c5f2be32_.log 2025-09-07T09:12:43.0201613Z 2025-09-07T09:12:43.0201990Z Running cpp/apply_utils_test 1/1 ... [2025-09-07 09:12:43.019797] 2025-09-07T09:12:43.0202708Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:43.0206085Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/apply_utils_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-48c0e2f03d460d64.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:43.020335] 2025-09-07T09:12:44.1370898Z 2025-09-07T09:12:44.1371922Z cpp/apply_utils_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.apply_utils_test_1.1_fd4e84008df00b1d_.log 2025-09-07T09:12:44.1372880Z 2025-09-07T09:12:44.1373628Z Running cpp/atest 1/1 ... [2025-09-07 09:12:44.137100] 2025-09-07T09:12:44.1374339Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:44.1379025Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/atest', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-9b512f7ee3ab8b10.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:44.137618] 2025-09-07T09:12:45.2543765Z 2025-09-07T09:12:45.2545278Z cpp/atest 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.atest_1.1_c13eb610a56898d7_.log 2025-09-07T09:12:45.2546720Z 2025-09-07T09:12:45.2547117Z Running cpp/basic 1/1 ... [2025-09-07 09:12:45.254372] 2025-09-07T09:12:45.2547541Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:45.2550897Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/basic', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-c708d367e0b76b0b.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:45.254860] 2025-09-07T09:12:46.3715719Z 2025-09-07T09:12:46.3717005Z cpp/basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.basic_1.1_4042d3ecb39f7120_.log 2025-09-07T09:12:46.3717839Z 2025-09-07T09:12:46.3718167Z Running cpp/broadcast_test 1/1 ... [2025-09-07 09:12:46.371566] 2025-09-07T09:12:46.3718786Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:46.3722893Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/broadcast_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-df1cbdd1b3251c9a.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:46.372023] 2025-09-07T09:12:47.4888860Z 2025-09-07T09:12:47.4889936Z cpp/broadcast_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.broadcast_test_1.1_08a235a491642a85_.log 2025-09-07T09:12:47.4890618Z 2025-09-07T09:12:47.4890885Z Running cpp/cpu_generator_test 1/1 ... [2025-09-07 09:12:47.488844] 2025-09-07T09:12:47.4891350Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:47.4894726Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/cpu_generator_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-8a971552a3fdc8c9.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:47.489209] 2025-09-07T09:12:48.6062777Z 2025-09-07T09:12:48.6064940Z cpp/cpu_generator_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.cpu_generator_test_1.1_ed987edcad0755d3_.log 2025-09-07T09:12:48.6065987Z 2025-09-07T09:12:48.6066388Z Running cpp/dlconvertor_test 1/1 ... [2025-09-07 09:12:48.606237] 2025-09-07T09:12:48.6067116Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:48.6071320Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/dlconvertor_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-e11eed27b6edc24a.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:48.606896] 2025-09-07T09:12:49.7236691Z 2025-09-07T09:12:49.7237973Z cpp/dlconvertor_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.dlconvertor_test_1.1_725b01298c11bbf9_.log 2025-09-07T09:12:49.7238697Z 2025-09-07T09:12:49.7238953Z Running cpp/extension_backend_test 1/1 ... [2025-09-07 09:12:49.723636] 2025-09-07T09:12:49.7239435Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:49.7243180Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/extension_backend_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-76a79ec87842ed0a.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:49.724048] 2025-09-07T09:12:50.8410396Z 2025-09-07T09:12:50.8412001Z cpp/extension_backend_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.extension_backend_test_1.1_94507847284c85b1_.log 2025-09-07T09:12:50.8413233Z 2025-09-07T09:12:50.8413471Z Running cpp/lazy_tensor_test 1/1 ... [2025-09-07 09:12:50.841052] 2025-09-07T09:12:50.8414349Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:50.8417495Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/lazy_tensor_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-8143724fe2889d75.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:50.841512] 2025-09-07T09:12:51.9582638Z 2025-09-07T09:12:51.9583933Z cpp/lazy_tensor_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.lazy_tensor_test_1.1_30d9ed23464c0660_.log 2025-09-07T09:12:51.9584768Z 2025-09-07T09:12:51.9585003Z Running cpp/legacy_vmap_test 1/1 ... [2025-09-07 09:12:51.958188] 2025-09-07T09:12:51.9585446Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:51.9587721Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/legacy_vmap_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-389f76741d997d87.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:51.958547] 2025-09-07T09:12:53.0756264Z 2025-09-07T09:12:53.0757583Z cpp/legacy_vmap_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.legacy_vmap_test_1.1_501c0e46ac198db4_.log 2025-09-07T09:12:53.0758546Z 2025-09-07T09:12:53.0758838Z Running cpp/native_test 1/1 ... [2025-09-07 09:12:53.075627] 2025-09-07T09:12:53.0759449Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:53.0764903Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/native_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-4fcf478769495f34.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:53.076198] 2025-09-07T09:12:54.1929857Z 2025-09-07T09:12:54.1931085Z cpp/native_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.native_test_1.1_889dfb1c5cfc0c0e_.log 2025-09-07T09:12:54.1932138Z 2025-09-07T09:12:54.1932533Z Running cpp/operators_test 1/1 ... [2025-09-07 09:12:54.192975] 2025-09-07T09:12:54.1933253Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:54.1938066Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/operators_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-8987a45198a81808.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:54.193581] 2025-09-07T09:12:55.3103773Z 2025-09-07T09:12:55.3105310Z cpp/operators_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.operators_test_1.1_4c21cdbcff4c45ba_.log 2025-09-07T09:12:55.3106432Z 2025-09-07T09:12:55.3106823Z Running cpp/scalar_tensor_test 1/1 ... [2025-09-07 09:12:55.310371] 2025-09-07T09:12:55.3107389Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:55.3111118Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/scalar_tensor_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-2e4db051db0d6131.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:55.310889] 2025-09-07T09:12:56.4282769Z 2025-09-07T09:12:56.4284313Z cpp/scalar_tensor_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.scalar_tensor_test_1.1_43209a80c6159d6a_.log 2025-09-07T09:12:56.4285501Z 2025-09-07T09:12:56.4288538Z Running cpp/scalar_test 1/1 ... [2025-09-07 09:12:56.428124] 2025-09-07T09:12:56.4289028Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:56.4290224Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/scalar_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-2cb036d074ba5de8.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:56.428742] 2025-09-07T09:12:57.5455902Z 2025-09-07T09:12:57.5457098Z cpp/scalar_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.scalar_test_1.1_117d8c6a622909b7_.log 2025-09-07T09:12:57.5457760Z 2025-09-07T09:12:57.5458007Z Running cpp/tensor_iterator_test 1/1 ... [2025-09-07 09:12:57.545588] 2025-09-07T09:12:57.5458476Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:57.5462407Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/tensor_iterator_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-3cd34ba5b40ba57b.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:57.545954] 2025-09-07T09:12:58.6627277Z 2025-09-07T09:12:58.6628419Z cpp/tensor_iterator_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.tensor_iterator_test_1.1_a1301a6c36f62262_.log 2025-09-07T09:12:58.6629562Z 2025-09-07T09:12:58.6629961Z Running cpp/undefined_tensor_test 1/1 ... [2025-09-07 09:12:58.662648] 2025-09-07T09:12:58.6630611Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:58.6633611Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/undefined_tensor_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-acdb5ec9c442e06f.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:58.663139] 2025-09-07T09:12:59.7797817Z 2025-09-07T09:12:59.7798671Z cpp/undefined_tensor_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.undefined_tensor_test_1.1_ebb8693abab52649_.log 2025-09-07T09:12:59.7799417Z 2025-09-07T09:12:59.7799635Z Running cpp/wrapdim_test 1/1 ... [2025-09-07 09:12:59.779715] 2025-09-07T09:12:59.7800069Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:59.7802621Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/wrapdim_test', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-e49e388d540243f2.xml', '-x', '--reruns=2'] ... [2025-09-07 09:12:59.780048] 2025-09-07T09:13:00.8971282Z 2025-09-07T09:13:00.8972759Z cpp/wrapdim_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.wrapdim_test_1.1_e987cc87798ac6ec_.log 2025-09-07T09:13:00.8973484Z 2025-09-07T09:13:03.9382940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:13:03.9385066Z import pkg_resources 2025-09-07T09:13:04.0612830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:13:04.0615122Z import pkg_resources 2025-09-07T09:13:04.1021012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:13:04.1023002Z import pkg_resources 2025-09-07T09:13:04.1213031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:13:04.1215133Z import pkg_resources 2025-09-07T09:13:04.1247577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:13:04.1249767Z import pkg_resources 2025-09-07T09:13:04.1360839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:13:04.1362995Z import pkg_resources 2025-09-07T09:13:04.1425171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:13:04.1427138Z import pkg_resources 2025-09-07T09:13:04.1465752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:13:04.1467702Z import pkg_resources 2025-09-07T09:13:04.5556174Z Running cpp/Dict_test 1/1 ... [2025-09-07 09:13:04.555395] 2025-09-07T09:13:04.5556629Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:04.5561461Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/Dict_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-a466db7dcc1a707a.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:04.555930] 2025-09-07T09:13:04.8117743Z Running cpp/Dimname_test 1/1 ... [2025-09-07 09:13:04.811584] 2025-09-07T09:13:04.8118200Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:04.8123584Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/Dimname_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-9ba7c9fcf0527b76.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:04.812150] 2025-09-07T09:13:04.8956462Z Running cpp/NamedTensor_test 1/1 ... [2025-09-07 09:13:04.895470] 2025-09-07T09:13:04.8956913Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:04.8962241Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/NamedTensor_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-ac6f71102ce87852.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:04.896002] 2025-09-07T09:13:04.9272666Z Running cpp/apply_utils_test 1/1 ... [2025-09-07 09:13:04.927087] 2025-09-07T09:13:04.9273172Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:04.9274162Z Running cpp/atest 1/1 ... [2025-09-07 09:13:04.927295] 2025-09-07T09:13:04.9274604Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:04.9275617Z Running cpp/basic 1/1 ... [2025-09-07 09:13:04.927418] 2025-09-07T09:13:04.9276043Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:04.9278065Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/apply_utils_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-13683ea1bde6eb8d.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:04.927599] 2025-09-07T09:13:04.9280393Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/atest', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-06fb321e6e13c31a.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:04.927811] 2025-09-07T09:13:04.9282825Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/basic', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-c8ffb834afefac41.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:04.927960] 2025-09-07T09:13:04.9313560Z Running cpp/broadcast_test 1/1 ... [2025-09-07 09:13:04.931189] 2025-09-07T09:13:04.9313990Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:04.9319941Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/broadcast_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-757f9cb121f23fbd.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:04.931762] 2025-09-07T09:13:04.9515099Z Running cpp/cpu_generator_test 1/1 ... [2025-09-07 09:13:04.951344] 2025-09-07T09:13:04.9515720Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:04.9521826Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/cpu_generator_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-4b07eb0444915829.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:04.951925] 2025-09-07T09:13:05.7230450Z 2025-09-07T09:13:05.7231671Z cpp/Dict_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.Dict_test_1.1_f8d6095a6065f36b_.log 2025-09-07T09:13:05.7232624Z 2025-09-07T09:13:05.7232930Z Running cpp/dlconvertor_test 1/1 ... [2025-09-07 09:13:05.723107] 2025-09-07T09:13:05.7233578Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:05.7238632Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/dlconvertor_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-bd60db83a9eaa3a9.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:05.723600] 2025-09-07T09:13:06.0293291Z 2025-09-07T09:13:06.0294539Z cpp/Dimname_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.Dimname_test_1.1_24c5dee9882c7b72_.log 2025-09-07T09:13:06.0295427Z 2025-09-07T09:13:06.0295752Z Running cpp/extension_backend_test 1/1 ... [2025-09-07 09:13:06.029406] 2025-09-07T09:13:06.0296382Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:06.0300328Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/extension_backend_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-3da4d8c1f537f909.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:06.029795] 2025-09-07T09:13:06.1130344Z 2025-09-07T09:13:06.1131145Z cpp/NamedTensor_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.NamedTensor_test_1.1_361a97e6fb2f449a_.log 2025-09-07T09:13:06.1132099Z 2025-09-07T09:13:06.1134022Z Running cpp/lazy_tensor_test 1/1 ... [2025-09-07 09:13:06.113229] 2025-09-07T09:13:06.1134637Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:06.1138832Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/lazy_tensor_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-b09352953f91cfe8.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:06.113613] 2025-09-07T09:13:06.1449314Z 2025-09-07T09:13:06.1449901Z cpp/apply_utils_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.apply_utils_test_1.1_af998c846e19ec23_.log 2025-09-07T09:13:06.1450851Z 2025-09-07T09:13:06.1450943Z 2025-09-07T09:13:06.1451413Z cpp/atest 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.atest_1.1_f24c8271d9c3fe05_.log 2025-09-07T09:13:06.1451974Z 2025-09-07T09:13:06.1452393Z cpp/basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.basic_1.1_a71fa98643f1d863_.log 2025-09-07T09:13:06.1453126Z Running cpp/legacy_vmap_test 1/1 ... [2025-09-07 09:13:06.145149] 2025-09-07T09:13:06.1453557Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:06.1454286Z 2025-09-07T09:13:06.1454290Z 2025-09-07T09:13:06.1456471Z Running cpp/native_test 1/1 ... [2025-09-07 09:13:06.145490] 2025-09-07T09:13:06.1457899Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/legacy_vmap_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-27441501a05e9d2c.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:06.145526] 2025-09-07T09:13:06.1459309Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:06.1459830Z Running cpp/operators_test 1/1 ... [2025-09-07 09:13:06.145559] 2025-09-07T09:13:06.1460340Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:06.1462298Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/native_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-e219244032e41def.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:06.146047] 2025-09-07T09:13:06.1464304Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/operators_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-f4b3dde65265f9b7.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:06.146097] 2025-09-07T09:13:06.1485691Z 2025-09-07T09:13:06.1486150Z cpp/broadcast_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.broadcast_test_1.1_ffc88406625099e3_.log 2025-09-07T09:13:06.1487851Z 2025-09-07T09:13:06.1490386Z Running cpp/scalar_tensor_test 1/1 ... [2025-09-07 09:13:06.148917] 2025-09-07T09:13:06.1490830Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:06.1495062Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/scalar_tensor_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-a5dd3cc379a8bb2c.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:06.149298] 2025-09-07T09:13:06.1694891Z 2025-09-07T09:13:06.1695722Z cpp/cpu_generator_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.cpu_generator_test_1.1_9db8147fc518aaa9_.log 2025-09-07T09:13:06.1697995Z 2025-09-07T09:13:06.1700767Z Running cpp/scalar_test 1/1 ... [2025-09-07 09:13:06.169972] 2025-09-07T09:13:06.1701164Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:06.1705530Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/scalar_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-c0b7fe6758e9fbea.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:06.170383] 2025-09-07T09:13:06.8909540Z 2025-09-07T09:13:06.8910815Z cpp/dlconvertor_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.dlconvertor_test_1.1_5bfae426f7670139_.log 2025-09-07T09:13:06.8911847Z 2025-09-07T09:13:06.8912180Z Running cpp/tensor_iterator_test 1/1 ... [2025-09-07 09:13:06.890849] 2025-09-07T09:13:06.8912744Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:06.8915415Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/tensor_iterator_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-7fc35a574a62f664.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:06.891301] 2025-09-07T09:13:07.2971021Z 2025-09-07T09:13:07.2972140Z cpp/extension_backend_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.extension_backend_test_1.1_8e84e31a05094480_.log 2025-09-07T09:13:07.2972913Z 2025-09-07T09:13:07.2973160Z Running cpp/undefined_tensor_test 1/1 ... [2025-09-07 09:13:07.296973] 2025-09-07T09:13:07.2973612Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:07.2975419Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/undefined_tensor_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-0c6db03c8387d7ec.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:07.297335] 2025-09-07T09:13:07.3306478Z 2025-09-07T09:13:07.3307481Z cpp/lazy_tensor_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.lazy_tensor_test_1.1_a2be41caa4db1a76_.log 2025-09-07T09:13:07.3308315Z 2025-09-07T09:13:07.3309037Z Running cpp/wrapdim_test 1/1 ... [2025-09-07 09:13:07.330737] 2025-09-07T09:13:07.3309547Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:13:07.3313652Z Executing ['pytest', '/var/lib/jenkins/pytorch/build/bin/wrapdim_test', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '8', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-5d04585a3d946779.xml', '-x', '--reruns=2'] ... [2025-09-07 09:13:07.331165] 2025-09-07T09:13:07.3622884Z 2025-09-07T09:13:07.3623542Z cpp/legacy_vmap_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.legacy_vmap_test_1.1_ec504d64a24ac5de_.log 2025-09-07T09:13:07.3624282Z 2025-09-07T09:13:07.3630796Z 2025-09-07T09:13:07.3631526Z cpp/operators_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.operators_test_1.1_357de19693926a72_.log 2025-09-07T09:13:07.3632152Z 2025-09-07T09:13:07.3632162Z 2025-09-07T09:13:07.3632606Z cpp/native_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.native_test_1.1_d0b48cc7da7706b1_.log 2025-09-07T09:13:07.3633168Z 2025-09-07T09:13:07.3661959Z 2025-09-07T09:13:07.3662526Z cpp/scalar_tensor_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.scalar_tensor_test_1.1_e79a881924a83cc7_.log 2025-09-07T09:13:07.3663177Z 2025-09-07T09:13:07.3877914Z 2025-09-07T09:13:07.3878421Z cpp/scalar_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.scalar_test_1.1_9d800199d8686903_.log 2025-09-07T09:13:07.3879007Z 2025-09-07T09:13:08.0585185Z 2025-09-07T09:13:08.0586375Z cpp/tensor_iterator_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.tensor_iterator_test_1.1_0beb02bed5780e2c_.log 2025-09-07T09:13:08.0587302Z 2025-09-07T09:13:08.4143212Z 2025-09-07T09:13:08.4144708Z cpp/undefined_tensor_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.undefined_tensor_test_1.1_f80df47c84373e9b_.log 2025-09-07T09:13:08.4145665Z 2025-09-07T09:13:08.4483732Z 2025-09-07T09:13:08.4484517Z cpp/wrapdim_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.wrapdim_test_1.1_cfc92185f65a7cf0_.log 2025-09-07T09:13:08.4485410Z 2025-09-07T09:13:09.2173763Z Running test batch 'tests to run' cost 29.5 seconds 2025-09-07T09:13:09.7906808Z + run_if_exists tensor_interop_test 2025-09-07T09:13:09.7907240Z + local test_name=tensor_interop_test 2025-09-07T09:13:09.7907607Z + [[ -x build/bin/tensor_interop_test ]] 2025-09-07T09:13:09.7908041Z + echo 'Warning: tensor_interop_test does not exist.' 2025-09-07T09:13:09.7908483Z Warning: tensor_interop_test does not exist. 2025-09-07T09:13:09.7908840Z + run_if_exists cudnn_test 2025-09-07T09:13:09.7909138Z + local test_name=cudnn_test 2025-09-07T09:13:09.7909441Z + [[ -x build/bin/cudnn_test ]] 2025-09-07T09:13:09.7909764Z + echo 'Warning: cudnn_test does not exist.' 2025-09-07T09:13:09.7910111Z Warning: cudnn_test does not exist. 2025-09-07T09:13:09.7910438Z + run_if_exists cuda_generator_test 2025-09-07T09:13:09.7910754Z + local test_name=cuda_generator_test 2025-09-07T09:13:09.7911091Z + [[ -x build/bin/cuda_generator_test ]] 2025-09-07T09:13:09.7911491Z + echo 'Warning: cuda_generator_test does not exist.' 2025-09-07T09:13:09.7911887Z Warning: cuda_generator_test does not exist. 2025-09-07T09:13:09.7912223Z + run_if_exists apply_test 2025-09-07T09:13:09.7912496Z + local test_name=apply_test 2025-09-07T09:13:09.7912796Z + [[ -x build/bin/apply_test ]] 2025-09-07T09:13:09.7913123Z + echo 'Warning: apply_test does not exist.' 2025-09-07T09:13:09.7913909Z Warning: apply_test does not exist. 2025-09-07T09:13:09.7914243Z + run_if_exists stream_test 2025-09-07T09:13:09.7914545Z + local test_name=stream_test 2025-09-07T09:13:09.7915001Z + [[ -x build/bin/stream_test ]] 2025-09-07T09:13:09.7915337Z + echo 'Warning: stream_test does not exist.' 2025-09-07T09:13:09.7915689Z Warning: stream_test does not exist. 2025-09-07T09:13:09.7916002Z + run_if_exists cuda_half_test 2025-09-07T09:13:09.7916297Z + local test_name=cuda_half_test 2025-09-07T09:13:09.7916590Z + [[ -x build/bin/cuda_half_test ]] 2025-09-07T09:13:09.7917063Z + echo 'Warning: cuda_half_test does not exist.' 2025-09-07T09:13:09.7917584Z Warning: cuda_half_test does not exist. 2025-09-07T09:13:09.7917876Z + run_if_exists cuda_vectorized_test 2025-09-07T09:13:09.7918141Z + local test_name=cuda_vectorized_test 2025-09-07T09:13:09.7918415Z + [[ -x build/bin/cuda_vectorized_test ]] 2025-09-07T09:13:09.7918741Z + echo 'Warning: cuda_vectorized_test does not exist.' 2025-09-07T09:13:09.7919067Z Warning: cuda_vectorized_test does not exist. 2025-09-07T09:13:09.7919354Z + run_if_exists cuda_distributions_test 2025-09-07T09:13:09.7919629Z + local test_name=cuda_distributions_test 2025-09-07T09:13:09.7919921Z + [[ -x build/bin/cuda_distributions_test ]] 2025-09-07T09:13:09.7920244Z + echo 'Warning: cuda_distributions_test does not exist.' 2025-09-07T09:13:09.7920590Z Warning: cuda_distributions_test does not exist. 2025-09-07T09:13:09.7920883Z + run_if_exists cuda_optional_test 2025-09-07T09:13:09.7921137Z + local test_name=cuda_optional_test 2025-09-07T09:13:09.7921405Z + [[ -x build/bin/cuda_optional_test ]] 2025-09-07T09:13:09.7921701Z + echo 'Warning: cuda_optional_test does not exist.' 2025-09-07T09:13:09.7922013Z Warning: cuda_optional_test does not exist. 2025-09-07T09:13:09.7922296Z + run_if_exists cuda_tensor_interop_test 2025-09-07T09:13:09.7922571Z + local test_name=cuda_tensor_interop_test 2025-09-07T09:13:09.7922858Z + [[ -x build/bin/cuda_tensor_interop_test ]] 2025-09-07T09:13:09.7923196Z + echo 'Warning: cuda_tensor_interop_test does not exist.' 2025-09-07T09:13:09.7923541Z Warning: cuda_tensor_interop_test does not exist. 2025-09-07T09:13:09.7923837Z + run_if_exists cuda_complex_test 2025-09-07T09:13:09.7924090Z + local test_name=cuda_complex_test 2025-09-07T09:13:09.7924343Z + [[ -x build/bin/cuda_complex_test ]] 2025-09-07T09:13:09.7924634Z + echo 'Warning: cuda_complex_test does not exist.' 2025-09-07T09:13:09.7924942Z Warning: cuda_complex_test does not exist. 2025-09-07T09:13:09.7925215Z + run_if_exists cuda_complex_math_test 2025-09-07T09:13:09.7925484Z + local test_name=cuda_complex_math_test 2025-09-07T09:13:09.7925757Z + [[ -x build/bin/cuda_complex_math_test ]] 2025-09-07T09:13:09.7926069Z + echo 'Warning: cuda_complex_math_test does not exist.' 2025-09-07T09:13:09.7926400Z Warning: cuda_complex_math_test does not exist. 2025-09-07T09:13:09.7926681Z + run_if_exists cuda_cub_test 2025-09-07T09:13:09.7926917Z + local test_name=cuda_cub_test 2025-09-07T09:13:09.7927158Z + [[ -x build/bin/cuda_cub_test ]] 2025-09-07T09:13:09.7927424Z + echo 'Warning: cuda_cub_test does not exist.' 2025-09-07T09:13:09.7927706Z Warning: cuda_cub_test does not exist. 2025-09-07T09:13:09.7927974Z + run_if_exists cuda_atomic_ops_test 2025-09-07T09:13:09.7928228Z + local test_name=cuda_atomic_ops_test 2025-09-07T09:13:09.7928491Z + [[ -x build/bin/cuda_atomic_ops_test ]] 2025-09-07T09:13:09.7928785Z + echo 'Warning: cuda_atomic_ops_test does not exist.' 2025-09-07T09:13:09.7929101Z Warning: cuda_atomic_ops_test does not exist. 2025-09-07T09:13:09.7929365Z + '[' OFF == ON ']' 2025-09-07T09:13:09.7929569Z + [[ -n '' ]] 2025-09-07T09:13:09.7929760Z + assert_git_not_dirty 2025-09-07T09:13:09.7929992Z + [[ linux-jammy-rocm-py3.10 != *rocm* ]] 2025-09-07T09:13:09.7930254Z + test_libtorch 1 2025-09-07T09:13:09.7930441Z + local SHARD=1 2025-09-07T09:13:09.7930631Z + [[ slow != \s\l\o\w ]] 2025-09-07T09:13:09.7930864Z + [[ linux-jammy-rocm-py3.10 == *xpu* ]] 2025-09-07T09:13:09.7931118Z + sccache_epilogue 2025-09-07T09:13:09.7931513Z + echo '::group::Sccache Compilation Log' 2025-09-07T09:13:09.7932117Z ##[group]Sccache Compilation Log 2025-09-07T09:13:09.7932503Z + echo '=================== sccache compilation log ===================' 2025-09-07T09:13:09.7932851Z =================== sccache compilation log =================== 2025-09-07T09:13:09.7933356Z + python /var/lib/jenkins/pytorch/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 2025-09-07T09:13:09.8072148Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 2025-09-07T09:13:09.8072851Z =========== If your build fails, please take a look at the log above for possible reasons =========== 2025-09-07T09:13:09.8073400Z + sccache --show-stats 2025-09-07T09:13:09.8106719Z Compile requests 1047 2025-09-07T09:13:09.8107069Z Compile requests executed 147 2025-09-07T09:13:09.8107383Z Cache hits 43 2025-09-07T09:13:09.8107730Z Cache hits (C/C++) 37 2025-09-07T09:13:09.8108213Z Cache hits (HIP) 6 2025-09-07T09:13:09.8108654Z Cache misses 102 2025-09-07T09:13:09.8109079Z Cache misses (C/C++) 89 2025-09-07T09:13:09.8109505Z Cache misses (HIP) 13 2025-09-07T09:13:09.8109938Z Cache hits rate 29.66 % 2025-09-07T09:13:09.8110381Z Cache hits rate (C/C++) 29.37 % 2025-09-07T09:13:09.8110823Z Cache hits rate (HIP) 31.58 % 2025-09-07T09:13:09.8111256Z Cache timeouts 0 2025-09-07T09:13:09.8111683Z Cache read errors 0 2025-09-07T09:13:09.8112111Z Forced recaches 0 2025-09-07T09:13:09.8112530Z Cache write errors 0 2025-09-07T09:13:09.8112942Z Cache errors 0 2025-09-07T09:13:09.8113370Z Compilations 102 2025-09-07T09:13:09.8113799Z Compilation failures 2 2025-09-07T09:13:09.8114254Z Non-cacheable compilations 0 2025-09-07T09:13:09.8114699Z Non-cacheable calls 20 2025-09-07T09:13:09.8115147Z Non-compilation calls 880 2025-09-07T09:13:09.8115592Z Unsupported compiler calls 0 2025-09-07T09:13:09.8116042Z Average cache write 0.000 s 2025-09-07T09:13:09.8116496Z Average compiler 11.650 s 2025-09-07T09:13:09.8116941Z Average cache read hit 0.000 s 2025-09-07T09:13:09.8117394Z Failed distributed compilations 0 2025-09-07T09:13:09.8117703Z 2025-09-07T09:13:09.8117849Z Non-cacheable reasons: 2025-09-07T09:13:09.8118163Z -E 18 2025-09-07T09:13:09.8118416Z unknown source language 2 2025-09-07T09:13:09.8118592Z 2025-09-07T09:13:09.8118755Z Cache location Local disk: "/var/lib/jenkins/.cache/sccache" 2025-09-07T09:13:09.8119109Z Use direct/preprocessor mode? yes 2025-09-07T09:13:09.8119374Z Version (client) 0.10.0 2025-09-07T09:13:09.8119636Z Cache size 39 MiB 2025-09-07T09:13:09.8119907Z Max cache size 10 GiB 2025-09-07T09:13:09.8120170Z + sccache --stop-server 2025-09-07T09:13:09.8146155Z Stopping sccache server... 2025-09-07T09:13:09.8149581Z Compile requests 1047 2025-09-07T09:13:09.8150118Z Compile requests executed 147 2025-09-07T09:13:09.8150556Z Cache hits 43 2025-09-07T09:13:09.8150972Z Cache hits (C/C++) 37 2025-09-07T09:13:09.8151399Z Cache hits (HIP) 6 2025-09-07T09:13:09.8151818Z Cache misses 102 2025-09-07T09:13:09.8152234Z Cache misses (C/C++) 89 2025-09-07T09:13:09.8152655Z Cache misses (HIP) 13 2025-09-07T09:13:09.8153087Z Cache hits rate 29.66 % 2025-09-07T09:13:09.8153529Z Cache hits rate (C/C++) 29.37 % 2025-09-07T09:13:09.8154173Z Cache hits rate (HIP) 31.58 % 2025-09-07T09:13:09.8154605Z Cache timeouts 0 2025-09-07T09:13:09.8155034Z Cache read errors 0 2025-09-07T09:13:09.8155605Z Forced recaches 0 2025-09-07T09:13:09.8156034Z Cache write errors 0 2025-09-07T09:13:09.8156451Z Cache errors 0 2025-09-07T09:13:09.8156861Z Compilations 102 2025-09-07T09:13:09.8157311Z Compilation failures 2 2025-09-07T09:13:09.8157858Z Non-cacheable compilations 0 2025-09-07T09:13:09.8158574Z Non-cacheable calls 20 2025-09-07T09:13:09.8159330Z Non-compilation calls 880 2025-09-07T09:13:09.8159899Z Unsupported compiler calls 0 2025-09-07T09:13:09.8160462Z Average cache write 0.000 s 2025-09-07T09:13:09.8161025Z Average compiler 11.650 s 2025-09-07T09:13:09.8161574Z Average cache read hit 0.000 s 2025-09-07T09:13:09.8161924Z Failed distributed compilations 0 2025-09-07T09:13:09.8162146Z 2025-09-07T09:13:09.8162240Z Non-cacheable reasons: 2025-09-07T09:13:09.8162463Z -E 18 2025-09-07T09:13:09.8162729Z unknown source language 2 2025-09-07T09:13:09.8162915Z 2025-09-07T09:13:09.8163088Z Cache location Local disk: "/var/lib/jenkins/.cache/sccache" 2025-09-07T09:13:09.8163465Z Use direct/preprocessor mode? yes 2025-09-07T09:13:09.8163741Z Version (client) 0.10.0 2025-09-07T09:13:09.8164014Z Cache size 39 MiB 2025-09-07T09:13:09.8164295Z Max cache size 10 GiB 2025-09-07T09:13:09.8164569Z + echo ::endgroup:: 2025-09-07T09:13:09.8164975Z ##[endgroup] 2025-09-07T09:13:09.8267214Z ##[group]Run # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct 2025-09-07T09:13:09.8267968Z # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct 2025-09-07T09:13:09.8268875Z docker exec -t "9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test" 2025-09-07T09:13:09.8307504Z shell: /usr/bin/bash -e {0} 2025-09-07T09:13:09.8307754Z env: 2025-09-07T09:13:09.8307957Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:09.8308355Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:09.8308933Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:09.8309473Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:09.8310364Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:09.8311116Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:09.8311366Z AWS_REGION: us-east-1 2025-09-07T09:13:09.8311655Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:09.8311979Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:09.8317099Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:09.8317463Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:09.8317855Z ##[endgroup] 2025-09-07T09:13:10.1009458Z ##[group]Run cat test/**/*_toprint.log || true 2025-09-07T09:13:10.1009819Z cat test/**/*_toprint.log || true 2025-09-07T09:13:10.1048971Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:13:10.1049327Z env: 2025-09-07T09:13:10.1049537Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:10.1049900Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:10.1050458Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:10.1050969Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:10.1051984Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:10.1052744Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:10.1053004Z AWS_REGION: us-east-1 2025-09-07T09:13:10.1053358Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:10.1053686Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:10.1058749Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:10.1059120Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:10.1059523Z ##[endgroup] 2025-09-07T09:13:10.1231667Z cat: 'test/**/*_toprint.log': No such file or directory 2025-09-07T09:13:10.1345634Z Prepare all required actions 2025-09-07T09:13:10.1346221Z Getting action download info 2025-09-07T09:13:10.4619237Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-09-07T09:13:11.1148786Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-09-07T09:13:11.6977599Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-09-07T09:13:11.6977913Z with: 2025-09-07T09:13:11.6978096Z use-gha: true 2025-09-07T09:13:11.6978368Z file-suffix: test-slow-1-2-linux.rocm.gpu.2_49774352868 2025-09-07T09:13:11.6978694Z s3-bucket: gha-artifacts 2025-09-07T09:13:11.6978914Z env: 2025-09-07T09:13:11.6979100Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:11.6979454Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:11.6979987Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:11.6980527Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:11.6981368Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:11.6982111Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:11.6998528Z AWS_REGION: us-east-1 2025-09-07T09:13:11.6998898Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:11.6999227Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:11.7004214Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:11.7004590Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:11.7004985Z ##[endgroup] 2025-09-07T09:13:11.7082518Z ##[group]Run actions/upload-artifact@v4 2025-09-07T09:13:11.7082796Z with: 2025-09-07T09:13:11.7083121Z name: test-jsons-runattempt1-test-slow-1-2-linux.rocm.gpu.2_49774352868.zip 2025-09-07T09:13:11.7083534Z retention-days: 14 2025-09-07T09:13:11.7083766Z if-no-files-found: warn 2025-09-07T09:13:11.7083995Z path: test/**/*.json 2025-09-07T09:13:11.7084205Z compression-level: 6 2025-09-07T09:13:11.7084411Z overwrite: false 2025-09-07T09:13:11.7084614Z include-hidden-files: false 2025-09-07T09:13:11.7084843Z env: 2025-09-07T09:13:11.7085020Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:11.7085374Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:11.7085901Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:11.7086395Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:11.7087219Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:11.7087945Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:11.7088187Z AWS_REGION: us-east-1 2025-09-07T09:13:11.7088454Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:11.7088763Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:11.7093723Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:11.7094171Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:11.7094718Z ##[endgroup] 2025-09-07T09:13:12.4887372Z With the provided path, there will be 8 files uploaded 2025-09-07T09:13:12.4893121Z Artifact name is valid! 2025-09-07T09:13:12.4894488Z Root directory input is valid! 2025-09-07T09:13:12.6552442Z Beginning upload of artifact content to blob storage 2025-09-07T09:13:12.9061539Z Uploaded bytes 46585 2025-09-07T09:13:12.9519439Z Finished uploading artifact content to blob storage! 2025-09-07T09:13:12.9522209Z SHA256 digest of uploaded artifact zip is 7d70a152d9172cd66bde57b3e235a943d0fbb6896bd6c6258a72876e8e7b9d50 2025-09-07T09:13:12.9523596Z Finalizing artifact upload 2025-09-07T09:13:13.0420672Z Artifact test-jsons-runattempt1-test-slow-1-2-linux.rocm.gpu.2_49774352868.zip.zip successfully finalized. Artifact ID 3946755681 2025-09-07T09:13:13.0422542Z Artifact test-jsons-runattempt1-test-slow-1-2-linux.rocm.gpu.2_49774352868.zip has been successfully uploaded! Final size is 46585 bytes. Artifact ID is 3946755681 2025-09-07T09:13:13.0427748Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/17524754569/artifacts/3946755681 2025-09-07T09:13:13.0649751Z ##[group]Run actions/upload-artifact@v4 2025-09-07T09:13:13.0650086Z with: 2025-09-07T09:13:13.0650482Z name: test-reports-runattempt1-test-slow-1-2-linux.rocm.gpu.2_49774352868.zip 2025-09-07T09:13:13.0650930Z retention-days: 14 2025-09-07T09:13:13.0651183Z if-no-files-found: ignore 2025-09-07T09:13:13.0651458Z path: test/**/*.xml test/**/*.csv 2025-09-07T09:13:13.0651735Z compression-level: 6 2025-09-07T09:13:13.0651967Z overwrite: false 2025-09-07T09:13:13.0652196Z include-hidden-files: false 2025-09-07T09:13:13.0652465Z env: 2025-09-07T09:13:13.0652666Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:13.0653068Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:13.0653622Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:13.0654229Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:13.0655093Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:13.0655860Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:13.0656116Z AWS_REGION: us-east-1 2025-09-07T09:13:13.0656427Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:13.0656761Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:13.0661787Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:13.0662162Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:13.0662566Z ##[endgroup] 2025-09-07T09:13:13.9036906Z With the provided path, there will be 564 files uploaded 2025-09-07T09:13:13.9037857Z Artifact name is valid! 2025-09-07T09:13:13.9039034Z Root directory input is valid! 2025-09-07T09:13:14.0554416Z Beginning upload of artifact content to blob storage 2025-09-07T09:13:14.9378068Z Uploaded bytes 1014157 2025-09-07T09:13:14.9826840Z Finished uploading artifact content to blob storage! 2025-09-07T09:13:14.9829882Z SHA256 digest of uploaded artifact zip is 71281f2ccc477b45e6bc70c82f73e8fb09074bde3973cb18b16a576b9cfa96b2 2025-09-07T09:13:14.9831268Z Finalizing artifact upload 2025-09-07T09:13:15.0866237Z Artifact test-reports-runattempt1-test-slow-1-2-linux.rocm.gpu.2_49774352868.zip.zip successfully finalized. Artifact ID 3946755751 2025-09-07T09:13:15.0868394Z Artifact test-reports-runattempt1-test-slow-1-2-linux.rocm.gpu.2_49774352868.zip has been successfully uploaded! Final size is 1014157 bytes. Artifact ID is 3946755751 2025-09-07T09:13:15.0876774Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/17524754569/artifacts/3946755751 2025-09-07T09:13:15.1124567Z ##[group]Run actions/upload-artifact@v4 2025-09-07T09:13:15.1124884Z with: 2025-09-07T09:13:15.1125220Z name: logs-runattempt1-test-slow-1-2-linux.rocm.gpu.2_49774352868.zip 2025-09-07T09:13:15.1125641Z retention-days: 14 2025-09-07T09:13:15.1126091Z if-no-files-found: ignore 2025-09-07T09:13:15.1126370Z path: usage_log.txt test/**/*.log 2025-09-07T09:13:15.1126653Z compression-level: 6 2025-09-07T09:13:15.1126887Z overwrite: false 2025-09-07T09:13:15.1127127Z include-hidden-files: false 2025-09-07T09:13:15.1127390Z env: 2025-09-07T09:13:15.1127594Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:15.1127978Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:15.1128560Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:15.1129092Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:15.1130388Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:15.1131197Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:15.1131468Z AWS_REGION: us-east-1 2025-09-07T09:13:15.1131785Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:15.1132141Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:15.1137261Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:15.1137659Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:15.1138081Z ##[endgroup] 2025-09-07T09:13:15.9586396Z Multiple search paths detected. Calculating the least common ancestor of all paths 2025-09-07T09:13:15.9588487Z The least common ancestor is /var/home/pytorchci/actions-runner/_work/pytorch/pytorch. This will be the root directory of the artifact 2025-09-07T09:13:15.9589178Z With the provided path, there will be 555 files uploaded 2025-09-07T09:13:15.9593936Z Artifact name is valid! 2025-09-07T09:13:15.9595041Z Root directory input is valid! 2025-09-07T09:13:16.1042609Z Beginning upload of artifact content to blob storage 2025-09-07T09:13:17.0420874Z Uploaded bytes 1626566 2025-09-07T09:13:17.0875484Z Finished uploading artifact content to blob storage! 2025-09-07T09:13:17.0878491Z SHA256 digest of uploaded artifact zip is 245dc2aed9c246472ca0c56ce9635b9524d9c12efeb32189c3843285a7103fcb 2025-09-07T09:13:17.0880003Z Finalizing artifact upload 2025-09-07T09:13:17.1899522Z Artifact logs-runattempt1-test-slow-1-2-linux.rocm.gpu.2_49774352868.zip.zip successfully finalized. Artifact ID 3946755845 2025-09-07T09:13:17.1901510Z Artifact logs-runattempt1-test-slow-1-2-linux.rocm.gpu.2_49774352868.zip has been successfully uploaded! Final size is 1626566 bytes. Artifact ID is 3946755845 2025-09-07T09:13:17.1907296Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/17524754569/artifacts/3946755845 2025-09-07T09:13:17.2170413Z ##[group]Run # shellcheck disable=SC2156 2025-09-07T09:13:17.2170840Z # shellcheck disable=SC2156 2025-09-07T09:13:17.2171378Z find . -iname "core.[1-9]*" -exec docker exec "${CONTAINER_NAME}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-09-07T09:13:17.2210360Z shell: /usr/bin/bash -e {0} 2025-09-07T09:13:17.2210658Z env: 2025-09-07T09:13:17.2210880Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:17.2211287Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:17.2211880Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:17.2212432Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:17.2213337Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:17.2214252Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:17.2214543Z AWS_REGION: us-east-1 2025-09-07T09:13:17.2214866Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:17.2215242Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:17.2220270Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:17.2220672Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:17.2221255Z ##[endgroup] 2025-09-07T09:13:17.6320586Z ##[group]Run aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 2025-09-07T09:13:17.6321090Z with: 2025-09-07T09:13:17.6321470Z role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_upload-benchmark-results 2025-09-07T09:13:17.6321935Z role-duration-seconds: 18000 2025-09-07T09:13:17.6322216Z aws-region: us-east-1 2025-09-07T09:13:17.6322475Z audience: sts.amazonaws.com 2025-09-07T09:13:17.6322753Z env: 2025-09-07T09:13:17.6322968Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:17.6323371Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:17.6324122Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:17.6324658Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:17.6325583Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:17.6326368Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:17.6326640Z AWS_REGION: us-east-1 2025-09-07T09:13:17.6326950Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:17.6327320Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:17.6332369Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:17.6332773Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:17.6333198Z ##[endgroup] 2025-09-07T09:13:17.9233025Z Assuming role with OIDC 2025-09-07T09:13:18.1344197Z Authenticated as assumedRoleId AROAUPVRELQNA5GQHA6IA:GitHubActions 2025-09-07T09:13:18.1968533Z ##[group]Run pytorch/test-infra/.github/actions/upload-benchmark-results@main 2025-09-07T09:13:18.1969018Z with: 2025-09-07T09:13:18.1969284Z benchmark-results-dir: test/test-reports 2025-09-07T09:13:18.1969611Z dry-run: false 2025-09-07T09:13:18.1969861Z schema-version: v3 2025-09-07T09:13:18.1970482Z github-token: *** 2025-09-07T09:13:18.1970745Z env: 2025-09-07T09:13:18.1970983Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:18.1971394Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:18.1971992Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:18.1972527Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:18.1973416Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:18.1974380Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:18.1974672Z AWS_REGION: us-east-1 2025-09-07T09:13:18.1974967Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:18.1975332Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:18.1980554Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:18.1980951Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:18.1981391Z ##[endgroup] 2025-09-07T09:13:18.2000897Z ##[group]Run set -eux 2025-09-07T09:13:18.2001199Z set -eux 2025-09-07T09:13:18.2001429Z  2025-09-07T09:13:18.2001659Z if [[ -n "" ]]; then 2025-09-07T09:13:18.2001938Z  source "" 2025-09-07T09:13:18.2002190Z fi 2025-09-07T09:13:18.2002550Z python3 -mpip install boto3==1.35.33 psutil==7.0.0 pynvml==12.0.0 2025-09-07T09:13:18.2002952Z  2025-09-07T09:13:18.2003188Z DEVICE_NAME="" 2025-09-07T09:13:18.2003464Z DEVICE_TYPE="" 2025-09-07T09:13:18.2003723Z  2025-09-07T09:13:18.2003976Z if command -v nvidia-smi; then 2025-09-07T09:13:18.2004429Z  # NB: I'm using PyTorch here to get the device name, however, it needs to 2025-09-07T09:13:18.2004984Z  # install the correct version of PyTorch manually for now. Any PyTorch 2025-09-07T09:13:18.2005657Z  # version is fine, I just use 2.7.1 to satify PYPIDEP linter 2025-09-07T09:13:18.2006085Z  python3 -mpip install torch==2.7.1 2025-09-07T09:13:18.2006442Z elif command -v rocminfo; then 2025-09-07T09:13:18.2006870Z  # NB: Installing torch on ROCm runner with pip here causes CI to fail 2025-09-07T09:13:18.2007380Z  # with a memoryview is too large error only on MI300 runners. Is pip 2025-09-07T09:13:18.2007894Z  # version on ROCm runner there too old? As a workaround, let's use the 2025-09-07T09:13:18.2008360Z  # GPU device name coming from rocminfo instead 2025-09-07T09:13:18.2008852Z  DEVICE_NAME=rocm 2025-09-07T09:13:18.2009323Z  DEVICE_TYPE=$(rocminfo | grep "Marketing Name" | tail -n1 | awk -F':' '{print $2}' | xargs) 2025-09-07T09:13:18.2009791Z fi 2025-09-07T09:13:18.2010013Z  2025-09-07T09:13:18.2010296Z echo "DEVICE_NAME=$DEVICE_NAME" >> $GITHUB_ENV 2025-09-07T09:13:18.2010689Z echo "DEVICE_TYPE=$DEVICE_TYPE" >> $GITHUB_ENV 2025-09-07T09:13:18.2050129Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:13:18.2050505Z env: 2025-09-07T09:13:18.2050737Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:18.2051158Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:18.2051746Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:18.2052293Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:18.2053437Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:18.2054361Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:18.2054657Z AWS_REGION: us-east-1 2025-09-07T09:13:18.2054978Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:18.2055357Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:18.2060576Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:18.2060986Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:18.2061435Z ##[endgroup] 2025-09-07T09:13:18.2128163Z + [[ -n '' ]] 2025-09-07T09:13:18.2128537Z + python3 -mpip install boto3==1.35.33 psutil==7.0.0 pynvml==12.0.0 2025-09-07T09:13:18.5136299Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T09:13:18.6247940Z Requirement already satisfied: boto3==1.35.33 in /var/home/pytorchci/.local/lib/python3.10/site-packages (1.35.33) 2025-09-07T09:13:18.6253130Z Requirement already satisfied: psutil==7.0.0 in /var/home/pytorchci/.local/lib/python3.10/site-packages (7.0.0) 2025-09-07T09:13:18.6259408Z Requirement already satisfied: pynvml==12.0.0 in /var/home/pytorchci/.local/lib/python3.10/site-packages (12.0.0) 2025-09-07T09:13:18.6300597Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3/dist-packages (from boto3==1.35.33) (0.10.0) 2025-09-07T09:13:18.6307257Z Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /var/home/pytorchci/.local/lib/python3.10/site-packages (from boto3==1.35.33) (0.10.4) 2025-09-07T09:13:18.6312366Z Requirement already satisfied: botocore<1.36.0,>=1.35.33 in /var/home/pytorchci/.local/lib/python3.10/site-packages (from boto3==1.35.33) (1.35.99) 2025-09-07T09:13:18.6491898Z Requirement already satisfied: nvidia-ml-py<13.0.0a0,>=12.0.0 in /var/home/pytorchci/.local/lib/python3.10/site-packages (from pynvml==12.0.0) (12.575.51) 2025-09-07T09:13:18.6546788Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /var/home/pytorchci/.local/lib/python3.10/site-packages (from botocore<1.36.0,>=1.35.33->boto3==1.35.33) (2.8.2) 2025-09-07T09:13:18.6556854Z Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in /usr/lib/python3/dist-packages (from botocore<1.36.0,>=1.35.33->boto3==1.35.33) (1.26.5) 2025-09-07T09:13:18.6609129Z Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.33->boto3==1.35.33) (1.16.0) 2025-09-07T09:13:19.0618674Z + DEVICE_NAME= 2025-09-07T09:13:19.0619116Z + DEVICE_TYPE= 2025-09-07T09:13:19.0619545Z + command -v nvidia-smi 2025-09-07T09:13:19.0619997Z + command -v rocminfo 2025-09-07T09:13:19.0620636Z /opt/rocm/bin/rocminfo 2025-09-07T09:13:19.0621137Z + DEVICE_NAME=rocm 2025-09-07T09:13:19.0632728Z ++ rocminfo 2025-09-07T09:13:19.0634545Z ++ grep 'Marketing Name' 2025-09-07T09:13:19.0638755Z ++ tail -n1 2025-09-07T09:13:19.0639009Z ++ awk -F: '{print $2}' 2025-09-07T09:13:19.0642066Z ++ xargs 2025-09-07T09:13:19.2107427Z + DEVICE_TYPE='AMD Instinct MI250X/MI250' 2025-09-07T09:13:19.2108099Z + echo DEVICE_NAME=rocm 2025-09-07T09:13:19.2108668Z + echo 'DEVICE_TYPE=AMD Instinct MI250X/MI250' 2025-09-07T09:13:19.2137130Z ##[group]Run set -eux 2025-09-07T09:13:19.2137439Z set -eux 2025-09-07T09:13:19.2137679Z  2025-09-07T09:13:19.2137946Z if [[ -z "${GITHUB_TOKEN}" ]]; then 2025-09-07T09:13:19.2138329Z  echo "Missing github-token input" 2025-09-07T09:13:19.2138643Z  exit 1 2025-09-07T09:13:19.2138902Z fi 2025-09-07T09:13:19.2183611Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:13:19.2184010Z env: 2025-09-07T09:13:19.2184271Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:19.2184711Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:19.2185312Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:19.2185888Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:19.2187123Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:19.2187939Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:19.2188248Z AWS_REGION: us-east-1 2025-09-07T09:13:19.2188606Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:19.2188979Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:19.2194221Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:19.2194646Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:19.2195101Z DEVICE_NAME: rocm 2025-09-07T09:13:19.2195390Z DEVICE_TYPE: AMD Instinct MI250X/MI250 2025-09-07T09:13:19.2195853Z GITHUB_TOKEN: *** 2025-09-07T09:13:19.2196114Z ##[endgroup] 2025-09-07T09:13:19.2257274Z + [[ -z *** ]] 2025-09-07T09:13:19.2304040Z ##[group]Run pytorch/test-infra/.github/actions/get-workflow-job-id@main 2025-09-07T09:13:19.2304486Z with: 2025-09-07T09:13:19.2304861Z github-token: *** 2025-09-07T09:13:19.2305121Z env: 2025-09-07T09:13:19.2305350Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:19.2305797Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:19.2306407Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:19.2306970Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:19.2307883Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:19.2308697Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:19.2308995Z AWS_REGION: us-east-1 2025-09-07T09:13:19.2309306Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:19.2309686Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:19.2314917Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:19.2315340Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:19.2315776Z DEVICE_NAME: rocm 2025-09-07T09:13:19.2316048Z DEVICE_TYPE: AMD Instinct MI250X/MI250 2025-09-07T09:13:19.2316347Z ##[endgroup] 2025-09-07T09:13:19.2332339Z ##[group]Run set -eux 2025-09-07T09:13:19.2332638Z set -eux 2025-09-07T09:13:19.2333026Z  2025-09-07T09:13:19.2333482Z python3 "${GITHUB_ACTION_PATH}/../../scripts/get_workflow_job_id.py" "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-09-07T09:13:19.2372330Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:13:19.2372717Z env: 2025-09-07T09:13:19.2372950Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:19.2373387Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:19.2374081Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:19.2374631Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:19.2375702Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:19.2376525Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:19.2376817Z AWS_REGION: us-east-1 2025-09-07T09:13:19.2377143Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:19.2377531Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:19.2382758Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:19.2383171Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:19.2383619Z DEVICE_NAME: rocm 2025-09-07T09:13:19.2383896Z DEVICE_TYPE: AMD Instinct MI250X/MI250 2025-09-07T09:13:19.2384301Z GITHUB_TOKEN: *** 2025-09-07T09:13:19.2384560Z ##[endgroup] 2025-09-07T09:13:19.2446133Z + python3 /var/home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/get-workflow-job-id/../../scripts/get_workflow_job_id.py 17524754569 gpu6c07 2025-09-07T09:13:20.0772421Z setting job-id=49774352868 2025-09-07T09:13:20.0773341Z setting job-name=linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm) 2025-09-07T09:13:20.0929144Z ##[group]Run set -eux 2025-09-07T09:13:20.0929454Z set -eux 2025-09-07T09:13:20.0929689Z  2025-09-07T09:13:20.0929936Z if [[ -n "" ]]; then 2025-09-07T09:13:20.0930214Z  source "" 2025-09-07T09:13:20.0930500Z fi 2025-09-07T09:13:20.0930739Z  2025-09-07T09:13:20.0931138Z python3 "${GITHUB_ACTION_PATH}/../../scripts/benchmarks/gather_metadata.py" \ 2025-09-07T09:13:20.0931641Z  --schema-version "${SCHEMA_VERSION}" \ 2025-09-07T09:13:20.0931979Z  --repo "${REPO}" \ 2025-09-07T09:13:20.0932300Z  --head-branch "${HEAD_BRANCH}" \ 2025-09-07T09:13:20.0932632Z  --head-sha "${HEAD_SHA}" \ 2025-09-07T09:13:20.0932965Z  --workflow-id "${WORKFLOW_RUN_ID}" \ 2025-09-07T09:13:20.0933328Z  --run-attempt "${RUN_ATTEMPT}" \ 2025-09-07T09:13:20.0933654Z  --job-id "${JOB_ID}" \ 2025-09-07T09:13:20.0934033Z  --job-name "${JOB_NAME}" 2025-09-07T09:13:20.0972112Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:13:20.0972485Z env: 2025-09-07T09:13:20.0972713Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:20.0973126Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:20.0973722Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:20.0974356Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:20.0975246Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:20.0976042Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:20.0976327Z AWS_REGION: us-east-1 2025-09-07T09:13:20.0976668Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:20.0977040Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:20.0982235Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:20.0982643Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:20.0983087Z DEVICE_NAME: rocm 2025-09-07T09:13:20.0983539Z DEVICE_TYPE: AMD Instinct MI250X/MI250 2025-09-07T09:13:20.0983854Z SCHEMA_VERSION: v3 2025-09-07T09:13:20.0984114Z REPO: pytorch/pytorch 2025-09-07T09:13:20.0984382Z HEAD_BRANCH: refs/heads/main 2025-09-07T09:13:20.0984714Z HEAD_SHA: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T09:13:20.0985061Z WORKFLOW_RUN_ID: 17524754569 2025-09-07T09:13:20.0985330Z RUN_ATTEMPT: 1 2025-09-07T09:13:20.0985570Z JOB_ID: 49774352868 2025-09-07T09:13:20.0985966Z JOB_NAME: linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm) 2025-09-07T09:13:20.0986415Z ##[endgroup] 2025-09-07T09:13:20.1045171Z + [[ -n '' ]] 2025-09-07T09:13:20.1046878Z + python3 /var/home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/benchmarks/gather_metadata.py --schema-version v3 --repo pytorch/pytorch --head-branch refs/heads/main --head-sha 93fb23d6fae7c4e82c4239a1033e522088742634 --workflow-id 17524754569 --run-attempt 1 --job-id 49774352868 --job-name 'linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm)' 2025-09-07T09:13:20.1375601Z ##[group]Run set -eux 2025-09-07T09:13:20.1375879Z set -eux 2025-09-07T09:13:20.1376101Z  2025-09-07T09:13:20.1376332Z if [[ -n "" ]]; then 2025-09-07T09:13:20.1376603Z  source "" 2025-09-07T09:13:20.1376848Z fi 2025-09-07T09:13:20.1377070Z  2025-09-07T09:13:20.1377450Z python3 "${GITHUB_ACTION_PATH}/../../scripts/benchmarks/gather_runners_info.py" 2025-09-07T09:13:20.1412513Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:13:20.1413154Z env: 2025-09-07T09:13:20.1413394Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:20.1413814Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:20.1414481Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:20.1415050Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:20.1415991Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:20.1416798Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:20.1417106Z AWS_REGION: us-east-1 2025-09-07T09:13:20.1417440Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:20.1417810Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:20.1423032Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:20.1423460Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:20.1423932Z DEVICE_NAME: rocm 2025-09-07T09:13:20.1424204Z DEVICE_TYPE: AMD Instinct MI250X/MI250 2025-09-07T09:13:20.1424523Z ##[endgroup] 2025-09-07T09:13:20.1478891Z + [[ -n '' ]] 2025-09-07T09:13:20.1479687Z + python3 /var/home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/benchmarks/gather_runners_info.py 2025-09-07T09:13:21.7203524Z ##[group]Run set -eux 2025-09-07T09:13:21.7203837Z set -eux 2025-09-07T09:13:21.7204076Z  2025-09-07T09:13:21.7204344Z # TODO (huydhn): Implement this part 2025-09-07T09:13:21.7204747Z echo "dependencies={}" >> "${GITHUB_OUTPUT}" 2025-09-07T09:13:21.7244899Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:13:21.7245297Z env: 2025-09-07T09:13:21.7245563Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:21.7245989Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:21.7246625Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:21.7247195Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:21.7248087Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:21.7249081Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:21.7249375Z AWS_REGION: us-east-1 2025-09-07T09:13:21.7249712Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:21.7250093Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:21.7255403Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:21.7255819Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:21.7256258Z DEVICE_NAME: rocm 2025-09-07T09:13:21.7256538Z DEVICE_TYPE: AMD Instinct MI250X/MI250 2025-09-07T09:13:21.7256840Z ##[endgroup] 2025-09-07T09:13:21.7316563Z + echo 'dependencies={}' 2025-09-07T09:13:21.7340182Z ##[group]Run set -eux 2025-09-07T09:13:21.7340475Z set -eux 2025-09-07T09:13:21.7340713Z  2025-09-07T09:13:21.7340932Z if [[ -n "" ]]; then 2025-09-07T09:13:21.7341226Z  source "" 2025-09-07T09:13:21.7341476Z fi 2025-09-07T09:13:21.7341711Z  2025-09-07T09:13:21.7341997Z if [[ ! -d "${BENCHMARK_RESULTS_DIR}" ]]; then 2025-09-07T09:13:21.7342460Z  echo "${BENCHMARK_RESULTS_DIR} does not exist, skipping" 2025-09-07T09:13:21.7342948Z  # We don't want the job to fail if the directory doesn't exist 2025-09-07T09:13:21.7343351Z  exit 0 2025-09-07T09:13:21.7343592Z fi 2025-09-07T09:13:21.7343818Z  2025-09-07T09:13:21.7344084Z if [[ "${DRY_RUN}" == "true" ]]; then 2025-09-07T09:13:21.7344566Z  python3 "${GITHUB_ACTION_PATH}/../../scripts/upload_benchmark_results.py" \ 2025-09-07T09:13:21.7345447Z  --benchmark-results-dir "${BENCHMARK_RESULTS_DIR}" \ 2025-09-07T09:13:21.7345899Z  --metadata "${BENCHMARK_METADATA}" \ 2025-09-07T09:13:21.7346256Z  --runners "${RUNNER_INFO}" \ 2025-09-07T09:13:21.7346608Z  --dependencies "${DEPENDENCIES}" \ 2025-09-07T09:13:21.7346944Z  --dry-run 2025-09-07T09:13:21.7347206Z else 2025-09-07T09:13:21.7347598Z  python3 "${GITHUB_ACTION_PATH}/../../scripts/upload_benchmark_results.py" \ 2025-09-07T09:13:21.7348118Z  --benchmark-results-dir "${BENCHMARK_RESULTS_DIR}" \ 2025-09-07T09:13:21.7348525Z  --metadata "${BENCHMARK_METADATA}" \ 2025-09-07T09:13:21.7348884Z  --runners "${RUNNER_INFO}" \ 2025-09-07T09:13:21.7349225Z  --dependencies "${DEPENDENCIES}" 2025-09-07T09:13:21.7349529Z fi 2025-09-07T09:13:21.7384567Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:13:21.7384956Z env: 2025-09-07T09:13:21.7385196Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:21.7385630Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:21.7386231Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:21.7386793Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:21.7387707Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:21.7388519Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:21.7388812Z AWS_REGION: us-east-1 2025-09-07T09:13:21.7389147Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:21.7389531Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:21.7394771Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:21.7395186Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:21.7395647Z DEVICE_NAME: rocm 2025-09-07T09:13:21.7395922Z DEVICE_TYPE: AMD Instinct MI250X/MI250 2025-09-07T09:13:21.7396273Z BENCHMARK_RESULTS_DIR: test/test-reports 2025-09-07T09:13:21.7396598Z DRY_RUN: false 2025-09-07T09:13:21.7397717Z BENCHMARK_METADATA: {"timestamp": 1757236400, "schema_version": "v3", "name": "linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm)", "repo": "pytorch/pytorch", "head_branch": "refs/heads/main", "head_sha": "93fb23d6fae7c4e82c4239a1033e522088742634", "workflow_id": 17524754569, "run_attempt": 1, "job_id": 49774352868} 2025-09-07T09:13:21.7399407Z RUNNER_INFO: [{"cpu_info": "x86_64", "cpu_count": 128, "avail_mem_in_gb": 1007, "extra_info": {"hostname": "gpu6c07.jax.cs.cpe.ice.amd.com"}, "name": "rocm", "type": "AMD Instinct MI250X/MI250"}] 2025-09-07T09:13:21.7400085Z DEPENDENCIES: {} 2025-09-07T09:13:21.7400345Z ##[endgroup] 2025-09-07T09:13:21.7457425Z + [[ -n '' ]] 2025-09-07T09:13:21.7457743Z + [[ ! -d test/test-reports ]] 2025-09-07T09:13:21.7458221Z + [[ false == \t\r\u\e ]] 2025-09-07T09:13:21.7460687Z + python3 /var/home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/upload_benchmark_results.py --benchmark-results-dir test/test-reports --metadata '{"timestamp": 1757236400, "schema_version": "v3", "name": "linux-jammy-rocm-py3.10 / test (slow, 1, 2, linux.rocm.gpu.2, module:rocm)", "repo": "pytorch/pytorch", "head_branch": "refs/heads/main", "head_sha": "93fb23d6fae7c4e82c4239a1033e522088742634", "workflow_id": 17524754569, "run_attempt": 1, "job_id": 49774352868}' --runners '[{"cpu_info": "x86_64", "cpu_count": 128, "avail_mem_in_gb": 1007, "extra_info": {"hostname": "gpu6c07.jax.cs.cpe.ice.amd.com"}, "name": "rocm", "type": "AMD Instinct MI250X/MI250"}]' --dependencies '{}' 2025-09-07T09:13:21.9089709Z /var/home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/upload_benchmark_results.py:236: UserWarning: {'included': [{'test_file': 'test_ci_sanity_check_fail'}, {'test_file': 'higher_order_ops/test_with_effects'}, {'test_file': 'test_utils'}, {'test_file': 'inductor/test_compiled_optimizers'}, {'test_file': 'test_jit_fuser_te'}, {'test_file': 'export/test_torchbind'}, {'test_file': 'dynamo/test_modes'}, {'test_file': 'dynamo/test_structured_trace'}, {'test_file': 'test_testing'}, {'test_file': 'test_ops'}, {'test_file': 'test_reductions'}, {'test_file': 'inductor/test_aot_inductor'}, {'test_file': 'inductor/test_kernel_benchmark'}, {'test_file': 'functorch/test_rearrange'}, {'test_file': 'test_package'}, {'test_file': 'functorch/test_parsing'}, {'test_file': 'inductor/test_extension_backend'}, {'test_file': 'export/test_retraceability'}, {'test_file': 'export/test_export_strict'}, {'test_file': 'inductor/test_triton_extension_backend'}, {'test_file': 'inductor/test_triton_syntax'}, {'test_file': 'test_autoload'}, {'test_file': 'dynamo/test_deque_reconstruct'}, {'test_file': 'test_utils_config_module'}, {'test_file': 'test_mkl_verbose'}, {'test_file': 'export/test_unflatten_training_ir'}, {'test_file': 'inductor/test_cpp_wrapper_hipify'}, {'test_file': 'inductor/test_external_callables'}, {'test_file': 'inductor/test_aot_inductor_arrayref'}, {'test_file': 'inductor/test_remote_cache'}, {'test_file': 'export/test_export_training_ir_to_run_decomp'}, {'test_file': 'inductor/test_segmented_tree'}, {'test_file': 'export/test_serdes'}, {'test_file': 'inductor/test_compiled_autograd'}, {'test_file': 'test_comparison_utils'}, {'test_file': 'inductor/test_provenance_tracing'}, {'test_file': 'export/test_functionalized_assertions'}, {'test_file': 'test_license'}, {'test_file': 'dynamo/test_base_output'}, {'test_file': 'inductor/test_triton_kernels'}, {'test_file': 'test_mkldnn_verbose'}, {'test_file': 'inductor/test_inductor_utils'}, {'test_file': 'inductor/test_flex_decoding'}, {'test_file': 'cpp_extensions/torch_stable_test_extension/torch_stable_test/test_torch_stable'}, {'test_file': 'inductor/test_analysis'}, {'test_file': 'test_extension_utils'}, {'test_file': 'test_rename_privateuse1_to_existing_device'}, {'test_file': 'inductor/test_cutedsl_template'}, {'test_file': 'inductor/test_ck_backend'}, {'test_file': 'inductor/test_memory_planning'}, {'test_file': 'export/test_export_with_inline_and_install'}, {'test_file': 'dynamo/test_skip_guard_eval_unsafe'}, {'test_file': 'inductor/test_inplace_padding'}, {'test_file': 'dynamo/test_buffers_override'}, {'test_file': 'test_custom_ops'}, {'test_file': 'inductor/test_flex_attention'}, {'test_file': 'inductor/test_b2b_gemm'}, {'test_file': 'functorch/test_ac_logging'}, {'test_file': 'inductor/test_inductor_annotations'}, {'test_file': 'dynamo/test_resume'}, {'test_file': 'inductor/test_template_heuristics_registry'}, {'test_file': 'inductor/test_debug_trace'}, {'test_file': 'test_ao_sparsity'}, {'test_file': 'inductor/test_cutlass_backend'}, {'test_file': 'test_cpp_api_parity'}, {'test_file': 'inductor/test_async_compile'}, {'test_file': 'dynamo/test_nops'}, {'test_file': 'torch_np/test_nep50_examples'}, {'test_file': 'torch_np/test_binary_ufuncs'}, {'test_file': 'inductor/test_best_config'}, {'test_file': 'test_hop_infra'}, {'test_file': 'torch_np/test_unary_ufuncs'}, {'test_file': 'inductor/test_aot_inductor_package'}, {'test_file': 'inductor/test_triton_cpu_backend'}, {'test_file': 'inductor/test_pad_mm'}, {'test_file': 'typing/test_python_operators'}, {'test_file': 'inductor/test_aot_inductor_custom_ops'}, {'test_file': 'inductor/test_cudagraph_trees'}, {'test_file': 'inductor/test_compile_worker'}, {'test_file': 'dynamo/test_modules'}, {'test_file': 'test_transformers'}, {'test_file': 'dynamo/test_global'}, {'test_file': 'export/test_export'}, {'test_file': 'test_foreach'}, {'test_file': 'test_appending_byte_serializer'}, {'test_file': 'test_fx_experimental'}, {'test_file': 'inductor/test_triton_wrapper'}, {'test_file': 'inductor/test_torchinductor_strided_blocks'}, {'test_file': 'test_file_check'}, {'test_file': 'dynamo/test_interop'}, {'test_file': 'dynamo/test_metrics_context'}, {'test_file': 'test_functionalization'}, {'test_file': 'dynamo/test_inline_and_install'}, {'test_file': 'inductor/test_smoke'}, {'test_file': 'torch_np/test_ufuncs_basic'}, {'test_file': 'test_proxy_tensor'}, {'test_file': 'inductor/test_fx_fusion'}, {'test_file': 'inductor/test_move_constructors_to_cuda'}, {'test_file': 'dynamo/test_skip_non_tensor'}, {'test_file': 'export/test_tree_utils'}, {'test_file': 'dynamo/test_frame_init'}, {'test_file': 'test_fx'}, {'test_file': 'torch_np/test_dtype'}, {'test_file': 'inductor/test_indexing'}, {'test_file': 'inductor/test_minifier_utils'}, {'test_file': 'test_typing'}, {'test_file': 'test_transformers_privateuse1'}, {'test_file': 'functorch/test_aot_joint_with_descriptors'}, {'test_file': 'test_utils_filelock'}, {'test_file': 'inductor/test_torchinductor'}, {'test_file': 'inductor/test_metrics'}, {'test_file': 'inductor/test_coordinate_descent_tuner'}, {'test_file': 'inductor/test_foreach'}, {'test_file': 'backends/xeon/test_launch'}, {'test_file': 'dynamo/test_functions'}, {'test_file': 'inductor/test_torchinductor_opinfo'}, {'test_file': 'dynamo/test_dicts'}, {'test_file': 'dynamo/test_sdpa'}, {'test_file': 'dynamo/test_list'}, {'test_file': 'inductor/test_autoheuristic'}, {'test_file': 'test_flop_counter'}, {'test_file': 'xpu/test_fusion'}, {'test_file': 'dynamo/test_fx_graph_runnable'}, {'test_file': 'inductor/test_ordered_set'}, {'test_file': 'dynamo/test_recompiles'}, {'test_file': 'test_per_overload_api'}, {'test_file': 'inductor/test_xpu_basic'}, {'test_file': 'export/test_cpp_serdes'}, {'test_file': 'inductor/test_utils'}, {'test_file': 'inductor/test_cuda_repro'}, {'test_file': 'test_pytree'}, {'test_file': 'inductor/test_fp8'}, {'test_file': 'dynamo/test_nested_graph_breaks'}, {'test_file': 'dynamo/test_pre_dispatch'}, {'test_file': 'dynamo/test_fx_passes_pre_grad'}, {'test_file': 'test_openreg'}, {'test_file': 'inductor/test_combo_kernels'}, {'test_file': 'inductor/test_gpu_cpp_wrapper'}, {'test_file': 'inductor/test_device_assert'}, {'test_file': 'inductor/test_op_completeness'}, {'test_file': 'export/test_tools'}, {'test_file': 'export/test_export_opinfo'}, {'test_file': 'dynamo/test_subgraphs'}, {'test_file': 'dynamo/test_dynamic_shapes'}, {'test_file': 'inductor/test_compile_subprocess'}, {'test_file': 'profiler/test_kineto'}, {'test_file': 'inductor/test_subgraph_choice'}, {'test_file': 'dynamo/test_utils'}, {'test_file': 'inductor/test_codecache'}, {'test_file': 'test_logging'}, {'test_file': 'test_expanded_weights'}, {'test_file': 'inductor/test_static_cuda_launcher'}, {'test_file': 'torch_np/test_random'}, {'test_file': 'inductor/test_triton_heuristics'}, {'test_file': 'export/test_schema'}, {'test_file': 'dynamo/test_reconstruct'}, {'test_file': 'inductor/test_helion_kernels'}, {'test_file': 'test_compile_benchmark_util'}, {'test_file': 'inductor/test_aot_inductor_utils'}, {'test_file': 'inductor/test_benchmark_fusion'}, {'test_file': 'inductor/test_cpu_cpp_wrapper'}, {'test_file': 'export/test_upgrader'}, {'test_file': 'higher_order_ops/test_invoke_subgraph'}, {'test_file': 'test_optim'}, {'test_file': 'export/test_passes'}, {'test_file': 'inductor/test_kernel_optimization'}, {'test_file': 'test_namedtensor'}, {'test_file': 'inductor/test_minifier'}, {'test_file': 'dynamo/test_autograd_function'}, {'test_file': 'inductor/test_profiler'}, {'test_file': 'inductor/test_select_algorithm'}, {'test_file': 'inductor/test_alignment'}, {'test_file': 'dynamo/test_config'}, {'test_file': 'dynamo/test_compile'}, {'test_file': 'test_openmp'}, {'test_file': 'inductor/test_torchinductor_codegen_dynamic_shapes'}, {'test_file': 'functorch/test_ops'}, {'test_file': 'test_import_stats'}, {'test_file': 'test_binary_ufuncs'}, {'test_file': 'lazy/test_bindings'}, {'test_file': 'test_fx_passes'}, {'test_file': 'export/test_db'}, {'test_file': 'inductor/test_group_batch_fusion'}, {'test_file': 'inductor/test_pattern_matcher'}, {'test_file': 'cpp_extensions/python_agnostic_extension/test/test_python_agnostic'}, {'test_file': 'torch_np/numpy_tests/core/test_scalarinherit'}, {'test_file': 'inductor/test_graph_transform_observer'}, {'test_file': 'test_show_pickle'}, {'test_file': 'dynamo/test_repros'}, {'test_file': 'inductor/test_fuzzer'}, {'test_file': 'inductor/test_quantization'}, {'test_file': 'test_native_functions'}, {'test_file': 'dynamo/test_install_free_tensors'}, {'test_file': 'functorch/test_aotdispatch'}, {'test_file': 'dynamo/test_graph_region_tracker'}, {'test_file': 'inductor/test_cooperative_reductions'}, {'test_file': 'inductor/test_inplacing_pass'}, {'test_file': 'dynamo/test_pgo'}, {'test_file': 'inductor/test_inductor_scheduler'}, {'test_file': 'inductor/test_cpu_select_algorithm'}, {'test_file': 'inductor/test_codegen_triton'}, {'test_file': 'export/test_package'}, {'test_file': 'inductor/test_cudacodecache'}, {'test_file': 'dynamo/test_export'}, {'test_file': 'inductor/test_custom_post_grad_passes'}, {'test_file': 'test_hub'}, {'test_file': 'dynamo/test_view'}, {'test_file': 'test_module_tracker'}, {'test_file': 'dynamo/test_after_aot'}, {'test_file': 'test_complex'}, {'test_file': 'test_meta'}, {'test_file': 'xpu/test_gemm'}, {'test_file': 'test_tensorexpr'}, {'test_file': 'inductor/test_halide'}, {'test_file': 'higher_order_ops/test_invoke_quant'}, {'test_file': 'inductor/test_online_softmax'}, {'test_file': 'inductor/test_split_cat_fx_passes'}, {'test_file': 'test_cuda_expandable_segments'}, {'test_file': 'test_type_hints'}, {'test_file': 'dynamo/test_unittest'}, {'test_file': 'inductor/test_max_autotune'}, {'test_file': 'dynamo/test_guard_serialization'}, {'test_file': 'functorch/test_minifier'}, {'test_file': 'test_legacy_vmap'}, {'test_file': 'dynamo/test_cudagraphs_expandable_segments'}, {'test_file': 'test_multiprocessing'}, {'test_file': 'torch_np/numpy_tests/core/test_einsum'}, {'test_file': 'inductor/test_benchmarking'}, {'test_file': 'dynamo/test_model_output'}, {'test_file': 'torch_np/test_basic'}, {'test_file': 'test_segment_reductions'}, {'test_file': 'test_ops_fwd_gradients'}, {'test_file': 'inductor/test_compile'}, {'test_file': 'test_dispatch'}, {'test_file': 'test_pruning_op'}, {'test_file': 'inductor/test_multi_kernel'}, {'test_file': 'inductor/test_decompose_mem_bound_mm'}, {'test_file': 'inductor/test_block_analysis'}, {'test_file': 'inductor/test_minifier_isolate'}, {'test_file': 'export/test_swap'}, {'test_file': 'functorch/test_dims'}, {'test_file': 'profiler/test_profiler'}, {'test_file': 'inductor/test_op_dtype_prop'}, {'test_file': 'test_tensorexpr_pybind'}, {'test_file': 'inductor/test_split_cat_fx_aten_passes'}, {'test_file': 'dynamo/test_misc'}, {'test_file': 'inductor/test_loop_ordering'}, {'test_file': 'inductor/test_torchinductor_dynamic_shapes'}, {'test_file': 'inductor/test_cutlass_evt'}, {'test_file': 'dynamo/test_sets'}, {'test_file': 'test_numpy_interop'}, {'test_file': 'inductor/test_cudagraph_trees_expandable_segments'}, {'test_file': 'dynamo/test_backward_higher_order_ops'}, {'test_file': 'inductor/test_torchinductor_codegen_config_overrides'}, {'test_file': 'test_nestedtensor'}, {'test_file': 'dynamo/test_export_mutations'}, {'test_file': 'inductor/test_scatter_optimization'}, {'test_file': 'xpu/test_conv'}, {'test_file': 'test_ops_jit'}, {'test_file': 'torch_np/numpy_tests/core/test_multiarray'}, {'test_file': 'inductor/test_perf'}, {'test_file': 'test_jit'}, {'test_file': 'inductor/test_layout_optim'}, {'test_file': 'nn/test_multihead_attention'}, {'test_file': 'inductor/test_binary_folding'}, {'test_file': 'inductor/test_snode_runtime'}, {'test_file': 'distributions/test_constraints'}, {'test_file': 'functorch/test_ac_knapsack'}, {'test_file': 'profiler/test_record_function'}, {'test_file': 'export/test_serialize'}, {'test_file': 'test_ops_gradients'}, {'test_file': 'functorch/test_vmap'}, {'test_file': 'dynamo/test_flat_apply'}, {'test_file': 'export/test_unflatten'}, {'test_file': 'test_jiterator'}, {'test_file': 'lazy/test_step_closures'}, {'test_file': 'test_namedtuple_return_api'}, {'test_file': 'inductor/test_memory'}, {'test_file': 'test_monitor'}, {'test_file': 'functorch/test_logging'}, {'test_file': 'test_stateless'}, {'test_file': 'torch_np/numpy_tests/core/test_numeric'}, {'test_file': 'test_weak'}, {'test_file': 'test_cpp_extensions_mtia_backend'}, {'test_file': 'inductor/test_mkldnn_pattern_matcher'}, {'test_file': 'test_jit_disabled'}, {'test_file': 'dynamo/test_optimizers'}, {'test_file': 'functorch/test_ac'}, {'test_file': 'inductor/test_dependencies'}, {'test_file': 'test_content_store'}, {'test_file': 'inductor/test_inductor_freezing'}, {'test_file': 'inductor/test_custom_lowering'}, {'test_file': 'inductor/test_control_flow'}, {'test_file': 'dynamo/test_profiler'}, {'test_file': 'optim/test_lrscheduler'}, {'test_file': 'test_fake_tensor'}, {'test_file': 'inductor/test_needs_exact_strides'}, {'test_file': 'inductor/test_config'}, {'test_file': 'dynamo/test_sources'}, {'test_file': 'test_cuda_trace'}, {'test_file': 'dynamo/test_base_hop'}, {'test_file': 'inductor/test_fused_attention'}, {'test_file': 'export/test_nativert'}, {'test_file': 'inductor/test_padding'}, {'test_file': 'inductor/test_torchbind'}, {'test_file': 'dynamo/test_backends'}, {'test_file': 'dynamo/test_verify_correctness'}, {'test_file': 'dynamo/test_python_dispatcher'}, {'test_file': 'test_set_default_mobile_cpu_allocator'}, {'test_file': 'torch_np/test_indexing'}, {'test_file': 'torch_np/test_scalars_0D_arrays'}, {'test_file': 'test_cpp_extensions_stream_and_event'}, {'test_file': 'test_numba_integration'}, {'test_file': 'dynamo/test_cudagraphs'}, {'test_file': 'dynamo/test_deviceguard'}, {'test_file': 'torch_np/numpy_tests/lib/test_function_base'}, {'test_file': 'test_tensorboard'}, {'test_file': 'dynamo/test_higher_order_ops'}, {'test_file': 'dynamo/test_comptime'}, {'test_file': 'test_datapipe'}, {'test_file': 'dynamo/test_logging'}, {'test_file': 'dynamo/test_debug_utils'}, {'test_file': 'test_out_dtype_op'}, {'test_file': 'functorch/test_eager_transforms'}, {'test_file': 'export/test_hop'}, {'test_file': 'profiler/test_cpp_thread'}, {'test_file': 'dynamo/test_aot_autograd_cache'}, {'test_file': 'inductor/test_auto_functionalize'}, {'test_file': 'torch_np/test_function_base'}, {'test_file': 'dynamo/test_activation_checkpointing'}, {'test_file': 'cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic'}, {'test_file': 'dynamo/test_aot_autograd'}, {'test_file': 'dynamo/test_graph_deduplication'}, {'test_file': 'test_model_exports_to_core_aten'}, {'test_file': 'test_itt'}, {'test_file': 'test_modules'}, {'test_file': 'test_python_dispatch'}, {'test_file': 'test_tensor_creation_ops'}, {'test_file': 'test_cuda_sanitizer'}, {'test_file': 'inductor/test_cpu_repro'}, {'test_file': 'inductor/test_efficient_conv_bn_eval'}, {'test_file': 'dynamo/test_error_messages'}, {'test_file': 'test_cuda'}, {'test_file': 'dynamo/test_trace_rules'}, {'test_file': 'inductor/test_unbacked_symints'}, {'test_file': 'dynamo/test_package'}, {'test_file': 'inductor/test_mps_basic'}, {'test_file': 'test_autograd_fallback'}, {'test_file': 'torch_np/numpy_tests/core/test_indexing'}, {'test_file': 'dynamo/test_decorators'}, {'test_file': 'nn/test_lazy_modules'}, {'test_file': 'test_fx_reinplace_pass'}, {'test_file': 'torch_np/numpy_tests/lib/test_type_check'}, {'test_file': 'dynamo/test_compiler_bisector'}, {'test_file': 'inductor/test_custom_partitioner_fn'}, {'test_file': 'test_type_info'}, {'test_file': 'dynamo/test_unspec'}, {'test_file': 'lazy/test_functionalization'}, {'test_file': 'dynamo/test_aot_compile'}, {'test_file': 'test_functionalization_of_rng_ops'}, {'test_file': 'test_subclass'}, {'test_file': 'test_decomp'}, {'test_file': 'dynamo/test_einops'}, {'test_file': 'dynamo/test_callback'}, {'test_file': 'nn/test_parametrization'}, {'test_file': 'test_masked'}, {'test_file': 'export/test_experimental'}, {'test_file': 'nn/test_pruning'}, {'test_file': 'export/test_converter'}, {'test_file': 'test_bundled_inputs'}, {'test_file': 'inductor/test_fxir_backend'}, {'test_file': 'torch_np/numpy_tests/lib/test_histograms'}, {'test_file': 'test_maskedtensor'}, {'test_file': 'test_autograd'}, {'test_file': 'dynamo/test_reorder_logs'}, {'test_file': 'dynamo/test_exceptions'}, {'test_file': 'export/test_lift_unlift'}, {'test_file': 'dynamo/test_torchrec'}, {'test_file': 'test_public_bindings'}, {'test_file': 'dynamo/test_exc'}, {'test_file': 'test_sparse_semi_structured'}, {'test_file': 'dynamo/test_input_attr_tracking'}, {'test_file': 'functorch/test_control_flow'}, {'test_file': 'test_matmul_cuda'}, {'test_file': 'test_dataloader'}, {'test_file': 'test_sympy_utils'}, {'test_file': 'inductor/test_mmdecomp'}, {'test_file': 'test_schema_check'}, {'test_file': 'export/test_pass_infra'}, {'test_file': 'dynamo/test_minifier'}, {'test_file': 'profiler/test_execution_trace'}, {'test_file': 'torch_np/numpy_tests/core/test_scalarmath'}, {'test_file': 'benchmark_utils/test_benchmark_utils'}, {'test_file': 'optim/test_swa_utils'}, {'test_file': 'dynamo/test_ctx_manager'}, {'test_file': 'dynamo/test_guard_manager'}, {'test_file': 'optim/test_optim'}, {'test_file': 'lazy/test_ts_opinfo'}, {'test_file': 'dynamo/test_recompile_ux'}, {'test_file': 'test_futures'}, {'test_file': 'dynamo/test_bytecode_utils'}, {'test_file': 'test_dynamic_shapes'}, {'test_file': 'functorch/test_vmap_registrations'}, {'test_file': 'dynamo/test_precompile_context'}, {'test_file': 'torch_np/numpy_tests/core/test_dtype'}, {'test_file': 'dynamo/test_fake_distributed'}, {'test_file': 'inductor/test_distributed_patterns'}, {'test_file': 'test_autocast'}, {'test_file': 'torch_np/numpy_tests/core/test_shape_base'}, {'test_file': 'dynamo/test_hooks'}, {'test_file': 'nn/test_packed_sequence'}, {'test_file': 'export/test_verifier'}, {'test_file': 'export/test_sparse'}, {'test_file': 'dynamo/test_generator'}, {'test_file': 'test_torch'}, {'test_file': 'functorch/test_memory_efficient_fusion'}, {'test_file': 'test_serialization'}, {'test_file': 'test_shape_ops'}, {'test_file': 'lazy/test_generator'}, {'test_file': 'test_numa_binding'}, {'test_file': 'torch_np/numpy_tests/lib/test_twodim_base'}, {'test_file': 'torch_np/numpy_tests/lib/test_arraypad'}, {'test_file': 'test_accelerator'}, {'test_file': 'torch_np/numpy_tests/core/test_getlimits'}, {'test_file': 'nn/test_embedding'}, {'test_file': 'torch_np/numpy_tests/fft/test_helper'}, {'test_file': 'nn/test_dropout'}, {'test_file': 'test_functional_optim'}, {'test_file': 'test_indexing'}, {'test_file': 'torch_np/numpy_tests/fft/test_pocketfft'}, {'test_file': 'torch_np/test_ndarray_methods'}, {'test_file': 'dynamo/test_subclasses'}, {'test_file': 'test_sort_and_select'}, {'test_file': 'torch_np/numpy_tests/lib/test_index_tricks'}, {'test_file': 'torch_np/numpy_tests/lib/test_shape_base_'}, {'test_file': 'test_cpp_extensions_jit'}, {'test_file': 'test_vulkan'}, {'test_file': 'torch_np/numpy_tests/linalg/test_linalg'}, {'test_file': 'nn/test_load_state_dict'}, {'test_file': 'export/test_draft_export'}, {'test_file': 'test_jit_llga_fuser'}, {'test_file': 'test_native_mha'}, {'test_file': 'test_cuda_primary_ctx'}, {'test_file': 'nn/test_module_hooks'}, {'test_file': 'test_view_ops'}, {'test_file': 'test_xnnpack_integration'}, {'test_file': 'test_mkldnn'}, {'test_file': 'torch_np/numpy_tests/core/test_dlpack'}, {'test_file': 'test_linalg'}, {'test_file': 'test_nn'}, {'test_file': 'test_mkldnn_fusion'}, {'test_file': 'test_sparse_csr'}, {'test_file': 'test_scatter_gather_ops'}, {'test_file': 'dynamo/test_python_autograd'}, {'test_file': 'torch_np/numpy_tests/core/test_scalar_methods'}, {'test_file': 'torch_np/numpy_tests/core/test_numerictypes'}, {'test_file': 'profiler/test_memory_profiler'}, {'test_file': 'nn/test_pooling'}, {'test_file': 'test_unary_ufuncs'}, {'test_file': 'lazy/test_debug_util'}, {'test_file': 'test_multiprocessing_spawn'}, {'test_file': 'nn/test_convolution'}, {'test_file': 'nn/test_init'}, {'test_file': 'torch_np/numpy_tests/lib/test_arraysetops'}, {'test_file': 'test_functional_autograd_benchmark'}, {'test_file': 'test_overrides'}, {'test_file': 'test_function_schema'}, {'test_file': 'test_cuda_multigpu'}, {'test_file': 'test_sparse'}, {'test_file': 'test_mobile_optimizer'}, {'test_file': 'test_type_promotion'}, {'test_file': 'torch_np/test_reductions'}, {'test_file': 'test_dlpack'}, {'test_file': 'torch_np/numpy_tests/core/test_scalar_ctors'}, {'test_file': 'profiler/test_profiler_tree'}, {'test_file': 'test_spectral_ops'}, {'test_file': 'test_prims'}, {'test_file': 'test_jit_autocast'}, {'test_file': 'profiler/test_torch_tidy'}, {'test_file': 'profiler/test_python_tracer'}, {'test_file': 'lazy/test_reuse_ir'}, {'test_file': 'distributions/test_distributions'}, {'test_file': 'test_quantization'}, {'test_file': 'doctests'}, {'test_file': 'test_autoload_disable'}, {'test_file': 'test_autoload_enable'}, {'test_file': 'test_cpp_extensions_aot_ninja'}, {'test_file': 'test_cpp_extensions_aot_no_ninja'}], 'excluded': []} from test/test-reports/td_exclusions-1195dc7ea8a509e45f7f.json is not a benchmark record, skipping 2025-09-07T09:13:21.9147304Z warn(f"{result} from {filepath} is not a benchmark record, skipping") 2025-09-07T09:13:21.9148726Z /var/home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/upload_benchmark_results.py:236: UserWarning: {'included': [{'test_file': 'lazy/test_ts_opinfo'}], 'excluded': []} from test/test-reports/td_exclusions-70601aa362a8f19c7288.json is not a benchmark record, skipping 2025-09-07T09:13:21.9150132Z warn(f"{result} from {filepath} is not a benchmark record, skipping") 2025-09-07T09:13:21.9153235Z /var/home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/upload_benchmark_results.py:236: UserWarning: {'included': [{'test_file': 'cpp/Dict_test'}, {'test_file': 'cpp/Dimname_test'}, {'test_file': 'cpp/NamedTensor_test'}, {'test_file': 'cpp/apply_utils_test'}, {'test_file': 'cpp/atest'}, {'test_file': 'cpp/basic'}, {'test_file': 'cpp/broadcast_test'}, {'test_file': 'cpp/cpu_generator_test'}, {'test_file': 'cpp/dlconvertor_test'}, {'test_file': 'cpp/extension_backend_test'}, {'test_file': 'cpp/lazy_tensor_test'}, {'test_file': 'cpp/legacy_vmap_test'}, {'test_file': 'cpp/native_test'}, {'test_file': 'cpp/operators_test'}, {'test_file': 'cpp/scalar_tensor_test'}, {'test_file': 'cpp/scalar_test'}, {'test_file': 'cpp/tensor_iterator_test'}, {'test_file': 'cpp/undefined_tensor_test'}, {'test_file': 'cpp/wrapdim_test'}], 'excluded': []} from test/test-reports/td_exclusions-debdb281f26f9e497a9b.json is not a benchmark record, skipping 2025-09-07T09:13:21.9156319Z warn(f"{result} from {filepath} is not a benchmark record, skipping") 2025-09-07T09:13:21.9282641Z Prepare all required actions 2025-09-07T09:13:21.9283091Z Getting action download info 2025-09-07T09:13:21.9310848Z ##[group]Run ./.github/actions/teardown-rocm 2025-09-07T09:13:21.9311177Z env: 2025-09-07T09:13:21.9311400Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:21.9311816Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:21.9312396Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:21.9312951Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:21.9314015Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:21.9314814Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:21.9315099Z AWS_REGION: us-east-1 2025-09-07T09:13:21.9315423Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:21.9315835Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:21.9321068Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:21.9321490Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:21.9321943Z DEVICE_NAME: rocm 2025-09-07T09:13:21.9322204Z DEVICE_TYPE: AMD Instinct MI250X/MI250 2025-09-07T09:13:21.9322520Z ##[endgroup] 2025-09-07T09:13:21.9338813Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T09:13:21.9339359Z # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T09:13:21.9339788Z # shellcheck disable=SC2046 2025-09-07T09:13:21.9340128Z docker stop $(docker ps -q) || true 2025-09-07T09:13:21.9340485Z # Prune all stopped containers. 2025-09-07T09:13:21.9340838Z docker container prune -f 2025-09-07T09:13:21.9374243Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:13:21.9374640Z env: 2025-09-07T09:13:21.9374899Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:21.9375344Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:21.9375964Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:21.9376542Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:21.9377443Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:21.9378257Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:21.9378555Z AWS_REGION: us-east-1 2025-09-07T09:13:21.9378861Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:21.9379241Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:21.9384476Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:21.9384898Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:21.9385343Z DEVICE_NAME: rocm 2025-09-07T09:13:21.9385623Z DEVICE_TYPE: AMD Instinct MI250X/MI250 2025-09-07T09:13:21.9385932Z ##[endgroup] 2025-09-07T09:13:32.7854722Z 9972aab73550 2025-09-07T09:13:44.1633305Z Deleted Containers: 2025-09-07T09:13:44.1634083Z 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:44.1634630Z 2025-09-07T09:13:44.1634838Z Total reclaimed space: 11.45GB 2025-09-07T09:13:44.1697326Z Prepare all required actions 2025-09-07T09:13:44.1728138Z ##[group]Run ./.github/actions/diskspace-cleanup 2025-09-07T09:13:44.1728483Z with: 2025-09-07T09:13:44.1728716Z diskspace-cutoff: 70 2025-09-07T09:13:44.1728998Z env: 2025-09-07T09:13:44.1729229Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:44.1729660Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:44.1730268Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:44.1730810Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:44.1732184Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:44.1733005Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:44.1733300Z AWS_REGION: us-east-1 2025-09-07T09:13:44.1733651Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:44.1734125Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:44.1739355Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:44.1739760Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:44.1740359Z DEVICE_NAME: rocm 2025-09-07T09:13:44.1740634Z DEVICE_TYPE: AMD Instinct MI250X/MI250 2025-09-07T09:13:44.1740934Z ##[endgroup] 2025-09-07T09:13:44.1757231Z ##[group]Run set -ex 2025-09-07T09:13:44.1757541Z set -ex 2025-09-07T09:13:44.1757860Z diskspace_cutoff=70 2025-09-07T09:13:44.1758260Z docker_root_dir=$(docker info -f '{{.DockerRootDir}}') 2025-09-07T09:13:44.1758688Z if [ ! -d "$docker_root_dir" ]; then 2025-09-07T09:13:44.1759203Z  echo "Docker root directory ($docker_root_dir) does not exist. Skipping disk space check." 2025-09-07T09:13:44.1759678Z  exit 0 2025-09-07T09:13:44.1759918Z fi 2025-09-07T09:13:44.1760336Z diskspace=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-09-07T09:13:44.1761152Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-09-07T09:13:44.1761846Z if [[ "$diskspace" -ge "$diskspace_cutoff" ]] ; then 2025-09-07T09:13:44.1762230Z  docker system prune -af 2025-09-07T09:13:44.1762735Z  diskspace_new=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-09-07T09:13:44.1763284Z  if [[ "$diskspace_new" -gt "$diskspace_cutoff" ]] ; then 2025-09-07T09:13:44.1763866Z  echo "Error: Available diskspace is less than $diskspace_cutoff percent. Not enough diskspace." 2025-09-07T09:13:44.1764382Z  echo "$msg" 2025-09-07T09:13:44.1764655Z  exit 1 2025-09-07T09:13:44.1764918Z  else 2025-09-07T09:13:44.1765221Z  difference=$((diskspace - diskspace_new)) 2025-09-07T09:13:44.1765623Z  echo "Diskspace saved: $difference percent" 2025-09-07T09:13:44.1766004Z  fi 2025-09-07T09:13:44.1766258Z fi 2025-09-07T09:13:44.1804053Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:13:44.1804434Z env: 2025-09-07T09:13:44.1804661Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:13:44.1805085Z RUNNER_ARTIFACT_DIR: /var/home/pytorchci/actions-runner/_work/_temp/artifacts 2025-09-07T09:13:44.1805705Z RUNNER_TEST_RESULTS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/test-results 2025-09-07T09:13:44.1806270Z RUNNER_DOCS_DIR: /var/home/pytorchci/actions-runner/_work/_temp/docs 2025-09-07T09:13:44.1807176Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 110 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:13:44.1807998Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:13:44.1808321Z AWS_REGION: us-east-1 2025-09-07T09:13:44.1808645Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:13:44.1809018Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:13:44.1814386Z AWS_SESSION_TOKEN: *** 2025-09-07T09:13:44.1814813Z CONTAINER_NAME: 9972aab7355040a857912593e4bf32852967b28c6efd493aca5391273fabf92f 2025-09-07T09:13:44.1815260Z DEVICE_NAME: rocm 2025-09-07T09:13:44.1815521Z DEVICE_TYPE: AMD Instinct MI250X/MI250 2025-09-07T09:13:44.1815832Z ##[endgroup] 2025-09-07T09:13:44.1876449Z + diskspace_cutoff=70 2025-09-07T09:13:44.1882758Z ++ docker info -f '{{.DockerRootDir}}' 2025-09-07T09:13:44.2436183Z + docker_root_dir=/media/4TB/docker-rootless 2025-09-07T09:13:44.2436891Z + '[' '!' -d /media/4TB/docker-rootless ']' 2025-09-07T09:13:44.2448197Z ++ df -H --output=pcent /media/4TB/docker-rootless 2025-09-07T09:13:44.2452456Z ++ sed -n 2p 2025-09-07T09:13:44.2452962Z ++ sed s/%// 2025-09-07T09:13:44.2453610Z ++ sed 's/ //' 2025-09-07T09:13:44.2482273Z + diskspace=33 2025-09-07T09:13:44.2483343Z + msg='Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified' 2025-09-07T09:13:44.2484414Z + [[ 33 -ge 70 ]] 2025-09-07T09:13:44.2536924Z Post job cleanup. 2025-09-07T09:13:44.2583640Z Post job cleanup. 2025-09-07T09:13:44.3876945Z Post job cleanup. 2025-09-07T09:13:44.4252821Z Logging out of registry 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:13:44.4618353Z Post job cleanup. 2025-09-07T09:13:44.5936715Z Post job cleanup. 2025-09-07T09:13:44.5979988Z Post job cleanup. 2025-09-07T09:13:44.6983511Z [command]/usr/bin/git version 2025-09-07T09:13:44.7027316Z git version 2.34.1 2025-09-07T09:13:44.7062386Z Copying '/var/home/pytorchci/.gitconfig' to '/var/home/pytorchci/actions-runner/_work/_temp/be3b3d58-a290-4462-ad9d-25509c9de937/.gitconfig' 2025-09-07T09:13:44.7071535Z Temporarily overriding HOME='/var/home/pytorchci/actions-runner/_work/_temp/be3b3d58-a290-4462-ad9d-25509c9de937' before making global git config changes 2025-09-07T09:13:44.7072435Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T09:13:44.7084575Z [command]/usr/bin/git config --global --add safe.directory /var/home/pytorchci/actions-runner/_work/pytorch/pytorch 2025-09-07T09:13:44.7129174Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T09:13:44.7178010Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T09:13:44.7576423Z Entering 'android/libs/fbjni' 2025-09-07T09:13:44.7657168Z Entering 'third_party/FP16' 2025-09-07T09:13:44.7725589Z Entering 'third_party/FXdiv' 2025-09-07T09:13:44.7794971Z Entering 'third_party/NNPACK' 2025-09-07T09:13:44.7866291Z Entering 'third_party/NVTX' 2025-09-07T09:13:44.7944147Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:13:44.8013616Z Entering 'third_party/XNNPACK' 2025-09-07T09:13:44.8101865Z Entering 'third_party/aiter' 2025-09-07T09:13:44.8171873Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:13:44.8257575Z Entering 'third_party/benchmark' 2025-09-07T09:13:44.8332967Z Entering 'third_party/composable_kernel' 2025-09-07T09:13:44.8413789Z Entering 'third_party/cpp-httplib' 2025-09-07T09:13:44.8485760Z Entering 'third_party/cpuinfo' 2025-09-07T09:13:44.8555984Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:13:44.8626937Z Entering 'third_party/cutlass' 2025-09-07T09:13:44.8710066Z Entering 'third_party/fbgemm' 2025-09-07T09:13:44.8788863Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:13:44.8864377Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:13:44.8936289Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:13:44.9007946Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:13:44.9095223Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:13:44.9161741Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:13:44.9230366Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:13:44.9310898Z Entering 'third_party/flash-attention' 2025-09-07T09:13:44.9380077Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:13:44.9457064Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:13:44.9538099Z Entering 'third_party/flatbuffers' 2025-09-07T09:13:44.9612105Z Entering 'third_party/fmt' 2025-09-07T09:13:44.9684764Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:13:44.9757431Z Entering 'third_party/gloo' 2025-09-07T09:13:44.9828823Z Entering 'third_party/googletest' 2025-09-07T09:13:44.9907487Z Entering 'third_party/ideep' 2025-09-07T09:13:44.9975548Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:13:45.0056934Z Entering 'third_party/ittapi' 2025-09-07T09:13:45.0126034Z Entering 'third_party/kineto' 2025-09-07T09:13:45.0196262Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:13:45.0266562Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:13:45.0332273Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:13:45.0398634Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:13:45.0468736Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:13:45.0535030Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:13:45.0611133Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:13:45.0681658Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:13:45.0759573Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:13:45.0830999Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:13:45.0906280Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:13:45.0967653Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:13:45.1042809Z Entering 'third_party/kleidiai' 2025-09-07T09:13:45.1114526Z Entering 'third_party/mimalloc' 2025-09-07T09:13:45.1190819Z Entering 'third_party/nlohmann' 2025-09-07T09:13:45.1261296Z Entering 'third_party/onnx' 2025-09-07T09:13:45.1347487Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:13:45.1426631Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:13:45.1491491Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:13:45.1561963Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:13:45.1630186Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:13:45.1696792Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:13:45.1767983Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:13:45.1833773Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:13:45.1898007Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:13:45.1964290Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:13:45.2043580Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:13:45.2116274Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:13:45.2218374Z Entering 'third_party/pocketfft' 2025-09-07T09:13:45.2290278Z Entering 'third_party/protobuf' 2025-09-07T09:13:45.2363548Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:13:45.2431984Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:13:45.2506152Z Entering 'third_party/psimd' 2025-09-07T09:13:45.2581888Z Entering 'third_party/pthreadpool' 2025-09-07T09:13:45.2644453Z Entering 'third_party/pybind11' 2025-09-07T09:13:45.2717307Z Entering 'third_party/python-peachpy' 2025-09-07T09:13:45.2786700Z Entering 'third_party/sleef' 2025-09-07T09:13:45.2857157Z Entering 'third_party/tensorpipe' 2025-09-07T09:13:45.2919847Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:13:45.3001719Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:13:45.3071576Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:13:45.3143426Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:13:45.3198703Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:13:45.3302982Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T09:13:45.3334087Z http.https://github.com/.extraheader 2025-09-07T09:13:45.3345545Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-09-07T09:13:45.3384715Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T09:13:45.3766904Z Entering 'android/libs/fbjni' 2025-09-07T09:13:45.3808434Z http.https://github.com/.extraheader 2025-09-07T09:13:45.3869458Z Entering 'third_party/FP16' 2025-09-07T09:13:45.3910819Z http.https://github.com/.extraheader 2025-09-07T09:13:45.3964604Z Entering 'third_party/FXdiv' 2025-09-07T09:13:45.4000487Z http.https://github.com/.extraheader 2025-09-07T09:13:45.4054698Z Entering 'third_party/NNPACK' 2025-09-07T09:13:45.4099276Z http.https://github.com/.extraheader 2025-09-07T09:13:45.4147518Z Entering 'third_party/NVTX' 2025-09-07T09:13:45.4189159Z http.https://github.com/.extraheader 2025-09-07T09:13:45.4244488Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:13:45.4281950Z http.https://github.com/.extraheader 2025-09-07T09:13:45.4335269Z Entering 'third_party/XNNPACK' 2025-09-07T09:13:45.4373454Z http.https://github.com/.extraheader 2025-09-07T09:13:45.4445183Z Entering 'third_party/aiter' 2025-09-07T09:13:45.4480361Z http.https://github.com/.extraheader 2025-09-07T09:13:45.4534790Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:13:45.4571282Z http.https://github.com/.extraheader 2025-09-07T09:13:45.4640760Z Entering 'third_party/benchmark' 2025-09-07T09:13:45.4678352Z http.https://github.com/.extraheader 2025-09-07T09:13:45.4734307Z Entering 'third_party/composable_kernel' 2025-09-07T09:13:45.4772207Z http.https://github.com/.extraheader 2025-09-07T09:13:45.4837800Z Entering 'third_party/cpp-httplib' 2025-09-07T09:13:45.4881152Z http.https://github.com/.extraheader 2025-09-07T09:13:45.4936505Z Entering 'third_party/cpuinfo' 2025-09-07T09:13:45.4975320Z http.https://github.com/.extraheader 2025-09-07T09:13:45.5031507Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:13:45.5071965Z http.https://github.com/.extraheader 2025-09-07T09:13:45.5126116Z Entering 'third_party/cutlass' 2025-09-07T09:13:45.5161052Z http.https://github.com/.extraheader 2025-09-07T09:13:45.5226486Z Entering 'third_party/fbgemm' 2025-09-07T09:13:45.5264017Z http.https://github.com/.extraheader 2025-09-07T09:13:45.5320706Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:13:45.5357004Z http.https://github.com/.extraheader 2025-09-07T09:13:45.5406370Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:13:45.5442074Z http.https://github.com/.extraheader 2025-09-07T09:13:45.5503912Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:13:45.5542988Z http.https://github.com/.extraheader 2025-09-07T09:13:45.5592107Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:13:45.5633468Z http.https://github.com/.extraheader 2025-09-07T09:13:45.5696748Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:13:45.5733808Z http.https://github.com/.extraheader 2025-09-07T09:13:45.5788964Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:13:45.5822185Z http.https://github.com/.extraheader 2025-09-07T09:13:45.5870204Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:13:45.5912417Z http.https://github.com/.extraheader 2025-09-07T09:13:45.5972115Z Entering 'third_party/flash-attention' 2025-09-07T09:13:45.6009939Z http.https://github.com/.extraheader 2025-09-07T09:13:45.6068911Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:13:45.6105559Z http.https://github.com/.extraheader 2025-09-07T09:13:45.6165982Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:13:45.6200432Z http.https://github.com/.extraheader 2025-09-07T09:13:45.6267833Z Entering 'third_party/flatbuffers' 2025-09-07T09:13:45.6305888Z http.https://github.com/.extraheader 2025-09-07T09:13:45.6362427Z Entering 'third_party/fmt' 2025-09-07T09:13:45.6402981Z http.https://github.com/.extraheader 2025-09-07T09:13:45.6458334Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:13:45.6502316Z http.https://github.com/.extraheader 2025-09-07T09:13:45.6553707Z Entering 'third_party/gloo' 2025-09-07T09:13:45.6595854Z http.https://github.com/.extraheader 2025-09-07T09:13:45.6649195Z Entering 'third_party/googletest' 2025-09-07T09:13:45.6684822Z http.https://github.com/.extraheader 2025-09-07T09:13:45.6740352Z Entering 'third_party/ideep' 2025-09-07T09:13:45.6778671Z http.https://github.com/.extraheader 2025-09-07T09:13:45.6830839Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:13:45.6866935Z http.https://github.com/.extraheader 2025-09-07T09:13:45.6929483Z Entering 'third_party/ittapi' 2025-09-07T09:13:45.6966284Z http.https://github.com/.extraheader 2025-09-07T09:13:45.7020674Z Entering 'third_party/kineto' 2025-09-07T09:13:45.7058526Z http.https://github.com/.extraheader 2025-09-07T09:13:45.7111803Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:13:45.7147930Z http.https://github.com/.extraheader 2025-09-07T09:13:45.7199147Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:13:45.7238987Z http.https://github.com/.extraheader 2025-09-07T09:13:45.7291606Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:13:45.7324322Z http.https://github.com/.extraheader 2025-09-07T09:13:45.7373079Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:13:45.7404711Z http.https://github.com/.extraheader 2025-09-07T09:13:45.7461033Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:13:45.7497647Z http.https://github.com/.extraheader 2025-09-07T09:13:45.7548535Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:13:45.7588487Z http.https://github.com/.extraheader 2025-09-07T09:13:45.7644641Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:13:45.7684365Z http.https://github.com/.extraheader 2025-09-07T09:13:45.7741378Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:13:45.7778323Z http.https://github.com/.extraheader 2025-09-07T09:13:45.7826843Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:13:45.7867981Z http.https://github.com/.extraheader 2025-09-07T09:13:45.7921679Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:13:45.7955085Z http.https://github.com/.extraheader 2025-09-07T09:13:45.8008865Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:13:45.8047474Z http.https://github.com/.extraheader 2025-09-07T09:13:45.8102997Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:13:45.8139375Z http.https://github.com/.extraheader 2025-09-07T09:13:45.8195085Z Entering 'third_party/kleidiai' 2025-09-07T09:13:45.8235230Z http.https://github.com/.extraheader 2025-09-07T09:13:45.8293260Z Entering 'third_party/mimalloc' 2025-09-07T09:13:45.8331693Z http.https://github.com/.extraheader 2025-09-07T09:13:45.8385786Z Entering 'third_party/nlohmann' 2025-09-07T09:13:45.8428896Z http.https://github.com/.extraheader 2025-09-07T09:13:45.8483991Z Entering 'third_party/onnx' 2025-09-07T09:13:45.8526010Z http.https://github.com/.extraheader 2025-09-07T09:13:45.8600526Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:13:45.8637887Z http.https://github.com/.extraheader 2025-09-07T09:13:45.8701028Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:13:45.8733978Z http.https://github.com/.extraheader 2025-09-07T09:13:45.8787829Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:13:45.8829481Z http.https://github.com/.extraheader 2025-09-07T09:13:45.8880525Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:13:45.8915337Z http.https://github.com/.extraheader 2025-09-07T09:13:45.8959532Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:13:45.8996767Z http.https://github.com/.extraheader 2025-09-07T09:13:45.9050819Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:13:45.9085835Z http.https://github.com/.extraheader 2025-09-07T09:13:45.9145056Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:13:45.9181338Z http.https://github.com/.extraheader 2025-09-07T09:13:45.9233766Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:13:45.9270911Z http.https://github.com/.extraheader 2025-09-07T09:13:45.9321634Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:13:45.9358644Z http.https://github.com/.extraheader 2025-09-07T09:13:45.9405633Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:13:45.9438409Z http.https://github.com/.extraheader 2025-09-07T09:13:45.9494676Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:13:45.9531513Z http.https://github.com/.extraheader 2025-09-07T09:13:45.9590024Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:13:45.9630874Z http.https://github.com/.extraheader 2025-09-07T09:13:45.9707397Z Entering 'third_party/pocketfft' 2025-09-07T09:13:45.9743595Z http.https://github.com/.extraheader 2025-09-07T09:13:45.9796806Z Entering 'third_party/protobuf' 2025-09-07T09:13:45.9838100Z http.https://github.com/.extraheader 2025-09-07T09:13:45.9896021Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:13:45.9932039Z http.https://github.com/.extraheader 2025-09-07T09:13:45.9986749Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:13:46.0021390Z http.https://github.com/.extraheader 2025-09-07T09:13:46.0073146Z Entering 'third_party/psimd' 2025-09-07T09:13:46.0113729Z http.https://github.com/.extraheader 2025-09-07T09:13:46.0165930Z Entering 'third_party/pthreadpool' 2025-09-07T09:13:46.0204219Z http.https://github.com/.extraheader 2025-09-07T09:13:46.0257021Z Entering 'third_party/pybind11' 2025-09-07T09:13:46.0299406Z http.https://github.com/.extraheader 2025-09-07T09:13:46.0351114Z Entering 'third_party/python-peachpy' 2025-09-07T09:13:46.0391765Z http.https://github.com/.extraheader 2025-09-07T09:13:46.0441696Z Entering 'third_party/sleef' 2025-09-07T09:13:46.0480687Z http.https://github.com/.extraheader 2025-09-07T09:13:46.0536060Z Entering 'third_party/tensorpipe' 2025-09-07T09:13:46.0574469Z http.https://github.com/.extraheader 2025-09-07T09:13:46.0628033Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:13:46.0664131Z http.https://github.com/.extraheader 2025-09-07T09:13:46.0716452Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:13:46.0753598Z http.https://github.com/.extraheader 2025-09-07T09:13:46.0804860Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:13:46.0841766Z http.https://github.com/.extraheader 2025-09-07T09:13:46.0890153Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:13:46.0929484Z http.https://github.com/.extraheader 2025-09-07T09:13:46.0985281Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:13:46.1017491Z http.https://github.com/.extraheader 2025-09-07T09:13:46.1264125Z Cleaning up orphan processes